Why did they implemented a map not with a hashtable? - hashmap

Why did they implemented a Map with a red-black tree which has complexity of O(log) and not with a hashtable which has a complexity of O(1) ?

Related

Haskell data structure with O(1) indexing/updating and O(logn) cons

Currently I need an immutable data structure (indexed by Int) that has fast read and update (basically, effectively O(1)), but somewhat slower cons (such as O(logn)) is acceptable. Other operations are not necessary. What I have now are:
Data.IntMap which is a radix tree. It has asymptotic O(1) for everything and is currently the best choice.
Data.Sequence which is a finger tree. It has worst-case O(logn) lookup/update but O(1) cons. That is not quite in line with my need and indeed performs worse then IntMaps in my microbenchmarks.
Data.HashMap has O(logn) for everything and performs worse than IntMap.
My question is: is there any Haskell data structures that can perform better than IntMap in this regard?

Haskell - efficient equivalent for hash-based dictionary [duplicate]

I'm looking for a monad-free, constant access query O(1) associative array.
Consider the hypothetical type:
data HT k v = ???
I want to construct an immutable structure once:
fromList :: Foldable t, Hashable k => t (k,v) -> HT k v
I want to subsequently query it repeatedly with constant time access::
lookup :: Hashable k => HT k v -> k -> Maybe v
There appears to be two candidate libraries which fall short:
unordered-containers
hashtables
unordered-containers
unordered-containers contains both strict and lazy variants of the type HashMap. Both HashMaps have O(log n) queries as documented by the lookup function. This query access time appears to be due to the construction of the HashMap types, which have an internal tree structure allowing for O(log n) insert functionality. An understandable design trade off for many use-cases, but since I don't need a mutable HashMap this tradeoff hampers my use-case.
hashtables
hashtables contains a HashTable type-class and three instance types with varying table constructions strategies. This library's type-class defines a constant time O(1) lookup function definition, but it is eternally embedded in the ST monad. There is no way to "freeze" the stateful HashTable implementations and have a lookup function that is not embedded of a stateful monad. The library's type-class interface is well designed when the entire computation is wrapped in a state monad, but this design is unsuitable for my use-case.
Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
Does there exist some way to wrap or modify these existing hashing-based libraries to produce an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
The library you want is… unordered-containers. Or just plain old Data.Map from containers, if you’d prefer.
The note in the unordered-containers documentation explains why you shouldn’t worry about the O(log n) time complexity for lookups:
Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.
This is a common practice with certain kinds of functional data structures because it allows good sharing properties while also having good time complexities. log16 still produces very small numbers even for very large n, so you can almost always treat those complexities as “effectively constant time”.
If this is ever a bottleneck for your application, sure, go with something else, but I find that highly unlikely. After all, log16(1,000,000) is a little under 5, so your lookup time is not going to grow very quickly. Processing all that data is going to take up much more time than the overhead of the lookup.
As always: profile first. If you somehow have a problem that absolutely needs the fastest possible hash map in the world, you might need an imperative hash map, but for every case I’ve ever had, the functional ones work just fine.
You should follow Alexis' suggestion and use unordered-containers. If you really want something that is guaranteed to have Θ(1) lookups, you can define your own frozen version of any of the hash table types from hashtables using unsafePerformIO, but this is not very elegant. For example:
module HT
( HT
, fromList
, lookup
) where
import qualified Data.HashTable.IO as H
import Data.Hashable (Hashable)
import Data.Foldable (toList)
import System.IO.Unsafe (unsafePerformIO)
import Prelude hiding (lookup)
newtype HT k v = HT (H.BasicHashTable k v)
fromList :: (Foldable t, Eq k, Hashable k) => t (k, v) -> HT k v
fromList = HT . unsafePerformIO . H.fromList . toList
lookup :: (Eq k, Hashable k) => HT k v -> k -> Maybe v
lookup (HT h) k = unsafePerformIO $ H.lookup h k
Both uses of unsafePerformIO above should be safe. For that is crucial that the HT is exported as an abstract type.
Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
At this point in time, the answer is still no.
As of late-2019 there is an efficient IO-based hashtable package with decent benchmarks.
What you describe seems doable in the same way that pure, immutable Data.Array construction is possible. See Data.Array.Base for how this is achieved via unsafe* operators. A Data.Array is defined with a bound, and my initial thought is that a pure, immutable hashtable will potentially have GC problems if it's allowed to grow without bounds.

Does a useful Haskell HashMap/HashTable/Dictionary library exist?

I'm looking for a monad-free, constant access query O(1) associative array.
Consider the hypothetical type:
data HT k v = ???
I want to construct an immutable structure once:
fromList :: Foldable t, Hashable k => t (k,v) -> HT k v
I want to subsequently query it repeatedly with constant time access::
lookup :: Hashable k => HT k v -> k -> Maybe v
There appears to be two candidate libraries which fall short:
unordered-containers
hashtables
unordered-containers
unordered-containers contains both strict and lazy variants of the type HashMap. Both HashMaps have O(log n) queries as documented by the lookup function. This query access time appears to be due to the construction of the HashMap types, which have an internal tree structure allowing for O(log n) insert functionality. An understandable design trade off for many use-cases, but since I don't need a mutable HashMap this tradeoff hampers my use-case.
hashtables
hashtables contains a HashTable type-class and three instance types with varying table constructions strategies. This library's type-class defines a constant time O(1) lookup function definition, but it is eternally embedded in the ST monad. There is no way to "freeze" the stateful HashTable implementations and have a lookup function that is not embedded of a stateful monad. The library's type-class interface is well designed when the entire computation is wrapped in a state monad, but this design is unsuitable for my use-case.
Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
Does there exist some way to wrap or modify these existing hashing-based libraries to produce an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
The library you want is… unordered-containers. Or just plain old Data.Map from containers, if you’d prefer.
The note in the unordered-containers documentation explains why you shouldn’t worry about the O(log n) time complexity for lookups:
Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.
This is a common practice with certain kinds of functional data structures because it allows good sharing properties while also having good time complexities. log16 still produces very small numbers even for very large n, so you can almost always treat those complexities as “effectively constant time”.
If this is ever a bottleneck for your application, sure, go with something else, but I find that highly unlikely. After all, log16(1,000,000) is a little under 5, so your lookup time is not going to grow very quickly. Processing all that data is going to take up much more time than the overhead of the lookup.
As always: profile first. If you somehow have a problem that absolutely needs the fastest possible hash map in the world, you might need an imperative hash map, but for every case I’ve ever had, the functional ones work just fine.
You should follow Alexis' suggestion and use unordered-containers. If you really want something that is guaranteed to have Θ(1) lookups, you can define your own frozen version of any of the hash table types from hashtables using unsafePerformIO, but this is not very elegant. For example:
module HT
( HT
, fromList
, lookup
) where
import qualified Data.HashTable.IO as H
import Data.Hashable (Hashable)
import Data.Foldable (toList)
import System.IO.Unsafe (unsafePerformIO)
import Prelude hiding (lookup)
newtype HT k v = HT (H.BasicHashTable k v)
fromList :: (Foldable t, Eq k, Hashable k) => t (k, v) -> HT k v
fromList = HT . unsafePerformIO . H.fromList . toList
lookup :: (Eq k, Hashable k) => HT k v -> k -> Maybe v
lookup (HT h) k = unsafePerformIO $ H.lookup h k
Both uses of unsafePerformIO above should be safe. For that is crucial that the HT is exported as an abstract type.
Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
At this point in time, the answer is still no.
As of late-2019 there is an efficient IO-based hashtable package with decent benchmarks.
What you describe seems doable in the same way that pure, immutable Data.Array construction is possible. See Data.Array.Base for how this is achieved via unsafe* operators. A Data.Array is defined with a bound, and my initial thought is that a pure, immutable hashtable will potentially have GC problems if it's allowed to grow without bounds.

Structure sharing Vector in Haskell

Is the Vector in Haskell structure sharing? In Clojure, modifying the (immutable) vector only takes O(log n) time because it is actually a trie-like structure. (http://hypirion.com/musings/understanding-persistent-vector-pt-1)
Is there an equivalent implementation in Haskell?
Data.Vector is plain arrays with O(n) modification.
At the time there is no equivalent of Clojure's vector.
Data.Sequence is implemented as a finger tree, and it supports a wider range of asymptotically efficient operations than Clojure's vector (O(log(n)) concatenation and splitting, O(1) read/write at both ends), but it's also a bit more heavyweight data structure with more RAM usage and some constant overheads.

Haskell: Datastruture with O(1) append and O(1) indexing?

I am looking for a data structure in Haskell that supports both fast indexing and fast append. This is for a memoization problem which arises from recursion.
From the way vectors work in c++ (which are mutable, but that shouldn't matter in this case) it seems immutable vectors with both (amortized) O(1) append and O(1) indexing should be possible (ok, it's not, see comments to this question). Is this poossible in Haskell or should I go with Data.Sequence which has (AFAICT anyway) O(1) append and O(log(min(i,n-i))) indexing?
On a related note, as a Haskell newbie I find myself longing for a practical, concise guide to Haskell data structures. Ideally this would give a fairly comprehensive overview over the most practical data structures along with performance characteristics and pointers to Haskell libraries where they are implemented. It seems that there is a lot of information out there, but I have found it to be a little scattered. Am I asking too much?
For simple memoization problems, you typically want to build the table once and then not modify it later. In that case, you can avoid having to worry about appending, by instead thinking of the construction of the memoization table as one operation.
One method is to take advantage of lazy evaluation and refer to the table while we're constructing it.
import Data.Array
fibs = listArray (0, n-1) $ 0 : 1 : [fibs!(i-1) + fibs!(i-2) | i <- [2..n-1]]
where n = 100
This method is especially useful when the dependencies between the elements of the table makes it difficult to come up with a simple order of evaluating them ahead of time. However, it requires using boxed arrays or vectors, which may make this approach unsuitable for large tables due to the extra overhead.
For unboxed vectors, you have operations like constructN which lets you build a table in a pure way while using mutation underneath to make it efficient. It does this by giving the function you pass an immutable view of the prefix of the vector constructed so far, which you can then use to compute the next element.
import Data.Vector.Unboxed as V
fibs = constructN 100 f
where f xs | i < 2 = i
| otherwise = xs!(i-1) + xs!(i-2)
where i = V.length xs
If memory serves, C++ vectors are implemented as an array with bounds and size information. When an insertion would increase the bounds beyond the size, the size is doubled. This is amortized O(1) time insertion (not O(1) as you claim), and can be emulated just fine in Haskell using the Array type, perhaps with suitable IO or ST prepended.
Take a look at this to make a more informed choice of what you should use.
But the simple thing is, if you want the equivalent of a C++ vector, use Data.Vector.

Resources