What are the differences between hashtable and hashmap? (Not specific to Java) - hashmap

During my most recent job interview for a software engineer position, I was asked this questions: what are the differences between hashtable and hashmap? I asked the interviewer if he was specific about Java since in Java hashtable is synchronized and hashmap is not (and actually tons of information to compare hashtable vs hashmap in Java after googling so that's not the answer I am looking for) but he said no and wanted to me to explain the the difference of these two in general.
I was really puzzled and shocked (actually still puzzled now) about this question. IMO, hastable or hashmap is simply a matter of terminology. Actually only Java has both terms and in other languages like C++, they don't even have the term hashtable. During the interview, I just explained the principle of hashing and said that hashmap and hashtable should both be implemented based on this principle and I don't know if there is any difference between these two. The interviewer was definitely not convinced and was looking for other answers and of course I was rejected after that round.
So back to the topic, what could possibly be the differences between hashmap and hashtable in general (not specific to Java) if there is any?

In Computer Science there is a difference due to the wording.
A HashTable is some kind of lookup table using key hashes to lookup the corresponding value in a table like data structure. Thats only one kind of a key-value Mapping. There are different implementations as you are probably aware. Different hashes, hash collusion solutions and table growing strategies and more under the hood. It's only interesting if you need to make your own hash table for whatever reason.
A HashMap is some kind of mapping of key-value pairs with a hashed key. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too.
You could simplify and say that a HashTable is the underlying data structure and the HashMap may be utilizing a HashTable.
A Dictionary is yet another abstraction level since it may not use hashes at all - for example with full text binary search lookups or other ways for compares. This is all you can get out of the words without considering certain programming languages.
--
Before thinking too much about it. Can you say - with certainty - that your interviewer had a clue about what he/she was talking about? Did you discuss technical details or did they just listen/ask and sometimes comment? Sometimes interviewers just come up with the most ridicules answers to problems they don't really understand in the first place.
Like you wrote yourself, in general it's just Terminology. Software Developers often use the Terms interchangeable, except maybe those who really have differences like in Java.

The interviewer may have been looking for the insight that...
a hash table is a lower-level concept that doesn't imply or necessarily support any distinction or separation of keys and values (i.e. you can implement a hash set of values using a hash table), while
a hash map must support distinct keys and values, as there's to be a mapping/association from keys to values; the two are distinct, even if in some implementations they're always stored side by side in memory, e.g. members of the same structure / std::pair<>.
Example: a (bad) hash table implementation preventing use as a hash map.
Consider:
template <typename T>
class Hash_Table
{
...
bool insert(const T& t)
{
// work out which bucket t hashes to...
size_t bucket = hash_bytes((void*)&t, sizeof t) % num_buckets_;
// see if t is already stored in the bucket...
if (memcmp((void*)&t, (void*)&buckets_[bucket], sizeof t) == 0)
...
... handle collisions etc. ...
}
...
};
Above, the hard-coded calls to a hash function that treats the value being inserted as a binary blob, and memcmp of the entire t, mean you can't make T say a std::pair<int, std::string> and use the hash table as a hash map from ints to strings. So, it's an example of a hash table that's not usable as a hash map.
You might or might not also consider a hash table that simply doesn't provide any convenience features for use as a hash map not to be a hash map. For example, if the API was designed as if dealing only in values - h.insert(t); h.erase(t); auto i = h.find(t); - but it allowed the caller to specify arbitrary custom comparison and hashing functions that could restrict their operations to only the key part of t, then the hash table could be (ab)used as a functional hash map.
To clarify how this relates to makadev's existing answer, I disagree with:
"A HashTable [uses] key hashes to lookup the corresponding value"; wrong because it assumes a key->value mapping.
"A HashMap [...]. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too."; wrong because the primary mechanism of a hash map is still hashing of the key to a bucket (index) in the table/array: some hash tables/maps may use other data structures (arrays, linked lists, trees...) to store elements that collide at the same bucket, but that's a different issue and not part of the difference between hash tables and hash maps.

Actually HashTable become obsoletes and HasHMap is best approach to use because Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.
Second difference is HashMap extends Map Interface and whether HashSet Dictionary interface.

Related

Haskell alternative for Doubly-linked-list coupled with Hash-table pattern

There's a useful pattern in imperative programming, namely, a doubly-linked-list coupled with a hash-table for constant time lookup in the linked list.
One application of this pattern is in LRU cache. The head of the doubly-linked-list will contain the least recently used entry in the cache and the last element in the doubly-linked-list will contain the most recently used entry. The keys in the hash-table are keys of the entries and the values are pointers to nodes in the linked-list corresponding to the key/entry. When an entry is queried in the cache, hash-table will be used to point to its node in the linked-list and then the node will be removed from its current location in the linked-list and be placed at the end of the linked-list making it the most-recently-used entry. For eviction, we simply remove entries from the head of the linked-list as they are the least recently used ones. Both lookup and eviction operations will take constant time.
I can think of implementing this in Haskell using two TreeMaps and I know that the time complexity will be O(log n). But I am a little uncomfortable as the constant factor in the time complexity seems a little high. Specifically, to perform a look-up, first I need to check if the entry exists and save its value, then I need to first delete it from the LRU map and re-insert it with a new key. This means that each lookup will result in a root-to-node traversal three times.
Is there a better way of doing this in Haskell?
As comments indicate, mutable vectors are perfectly acceptable when required. However, I think there's an issue with the way you've stated the question - unless the idea is to duplicate "as closely as possible" (without mutable structures) the imperative code, why bother having 2 treemaps? A single priority search queue (see packages pqueue or PSQueue) would be an appropriate structure whilst maintaining purity. It supports efficiently both priorities (for eviction) and searching (for lookups of your desired cached argument).
On a related note, some structures support eg. Data.Map's alterF, which effectively provides you with a continuation allowing you to "do something else" dependent on the Maybe value at a key, but "remembering" where you are and thus avoiding to pay the full cost to re-traverse the structure to subsequently modify at this key. See also the at lens.

Is this approach to dealing with hash collisions new/unique?

When dealing with hash maps, I have seen a few strategies to deal with hash collisions, but we have come up with something different.
I was wondering if this is something new or not.
This version of a hash map only works if the hash and the data structures that will be hashed are salteable.
(This is the case in hashable in Haskell, where we suggested implementing this approach.)
The idea is that, instead of storing a list or array in each cell of the hash map, you store a recursive hash map. The only difference in this recursive hash map is that you use a different salt.
This way the hash collisions on one level of the hash map are most likely not hash collisions on the next level.
As a result, insertion into such a hash map is no longer O(number of collisions on this hash) but O(number of levels that the collisions on this happen at recursively), which is most likely better.
A more detailed explanation and an implementation can be found here:
https://github.com/tibbe/unordered-containers/pull/217/files/58af4519ace34c5f7d3c1359907ff75e27b9cdb8#diff-ba23e0f18c79cb873ac5375367524cfaR114
Your idea seems to be effectively the same as the one suggested in the Fredman, Komlós & Szemerédi paper from 1984. As Wikipedia summarizes it:
FKS Hashing makes use of a hash table with two levels in which the top level contains n buckets which each contain their own hash table.
In contrast to your idea, the local hash maps aren't recursive, instead each of them chooses a salt that makes it a perfect hash. In practice, this will (as you say) usually be given already by the first salt you try, thus it's asymptotically constant-time.

Can I use the hashes in a HashMap directly?

Is it possible to insert and get values into/from a HashMap directly with a Hash provided, so I can cache the hashes?
I want to do something like this:
map.insert(key, "value");
let hashed_key = {
let mut hasher = map.hasher().build_hasher();
key.hash(&mut hasher);
hasher.finish()
};
assert_eq!(map.get(key).unwrap(), map.get_by_hash(hashed_key).unwrap());
playground
No.
This is fundamentally impossible at the algorithmic level.
By design, a hash operation is surjective: multiple elements may hash to the same value. Therefore, any HashMap implementation can only use the hash as a hint and must then use a full equality comparison to check that the element found by the hint is the right element (or not).
At best, a get_by_hash method would return an Iterator of all possible elements matching the current hash.
For a degenerate case, consider a hashing algorithm which always returns 4 (obtained by the roll of a fair dice). Which element would you expect it to return?
Work-around
If caching is what you are after, the trick in languages with no HashBuilder is to pre-hash (and cache) the hash inside the key itself.
It requires caching the full key (because of the equality check), but hashing is then a very simple operation (return the cached value).
It does not, however, speed up the equality check, which depending on the value may be quite expensive.
You could adapt the pattern to Rust, although you would lose the advantage of using a HashBuilder.

Haskell data structure that is efficient for swapping elements?

I am looking for a Haskell data structure that stores an ordered list of elements and that is time-efficient at swapping pairs of elements at arbitrary locations within the list. It's not [a], obviously. It's not Vector because swapping creates new vectors. Which data structure is efficient at this?
The most efficient implementations of persistent data structures, which exhibit O(1) updates (as well as appending, prepending, counting and slicing), are based on the Array Mapped Trie algorithm. The Vector data-structures of Clojure and Scala are based on it, for instance. The only Haskell implementation of that data-structure that I know of is presented by the "persistent-vector" package.
This algorithm is very young, it was only first presented in the year 2000, which might be the reason why not so many people have ever heard about it. But the thing turned out to be such a universal solution that it got adapted for Hash-tables soon after. The adapted version of this algorithm is called Hash Array Mapped Trie. It is as well used in Clojure and Scala to implement the Set and Map data-structures. It is also more ubiquitous in Haskell with packages like "unordered-containers" and "stm-containers" revolving around it.
To learn more about the algorithm I recommend the following links:
http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation.html
http://lampwww.epfl.ch/papers/idealhashtrees.pdf
Data.Sequence from the containers package would likely be a not-terrible data structure to start with for this use case.
Haskell is a (nearly) pure functional language, so any data structure you update will need to make a new copy of the structure, and re-using the data elements is close to the best you can do. Also, the new list would be lazily evaluated and typically only the spine would need to be created until you need the data. If the number of updates is small compared to the number of elements, you could make a difference list that checks a sparse set of updates first, and only then looks in the original vector.

What's the most idiomatic approach to multi-index collections in Haskell?

In C++ and other languages, add-on libraries implement a multi-index container, e.g. Boost.Multiindex. That is, a collection that stores one type of value but maintains multiple different indices over those values. These indices provide for different access methods and sorting behaviors, e.g. map, multimap, set, multiset, array, etc. Run-time complexity of the multi-index container is generally the sum of the individual indices' complexities.
Is there an equivalent for Haskell or do people compose their own? Specifically, what is the most idiomatic way to implement a collection of type T with both a set-type of index (T is an instance of Ord) as well as a map-type of index (assume that a key value of type K could be provided for each T, either explicitly or via a function T -> K)?
I just uploaded IxSet to hackage this morning,
http://hackage.haskell.org/package/ixset
ixset provides sets which have multiple indexes.
ixset has been around for a long time as happstack-ixset. This version removes the dependencies on anything happstack specific, and is the new official version of IxSet.
Another option would be kdtree:
darcs get http://darcs.monoid.at/kdtree
kdtree aims to improve on IxSet by offering greater type-safety and better time and space usage. The current version seems to do well on all three of those aspects -- but it is not yet ready for prime time. Additional contributors would be highly welcomed.
In the trivial case where every element has a unique key that's always available, you can just use a Map and extract the key to look up an element. In the slightly less trivial case where each value merely has a key available, a simple solution it would be something like Map K (Set T). Looking up an element directly would then involve first extracting the key, indexing the Map to find the set of elements that share that key, then looking up the one you want.
For the most part, if something can be done straightforwardly in the above fashion (simple transformation and nesting), it probably makes sense to do it that way. However, none of this generalizes well to, e.g., multiple independent keys or keys that may not be available, for obvious reasons.
Beyond that, I'm not aware of any widely-used standard implementations. Some examples do exist, for example IxSet from happstack seems to roughly fit the bill. I suspect one-size-kinda-fits-most solutions here are liable to have a poor benefit/complexity ratio, so people tend to just roll their own to suit specific needs.
Intuitively, this seems like a problem that might work better not as a single implementation, but rather a collection of primitives that could be composed more flexibly than Data.Map allows, to create ad-hoc specialized structures. But that's not really helpful for short-term needs.
For this specific question, you can use a Bimap. In general, though, I'm not aware of any common class for multimaps or multiply-indexed containers.
I believe that the simplest way to do this is simply with Data.Map. Although it is designed to use single indices, when you insert the same element multiple times, most compilers (certainly GHC) will make the values place to the same place. A separate implementation of a multimap wouldn't be that efficient, as you want to find elements based on their index, so you cannot naively associate each element with multiple indices - say [([key], value)] - as this would be very inefficient.
However, I have not looked at the Boost implementations of Multimaps to see, definitively, if there is an optimized way of doing so.
Have I got the problem straight? Both T and K have an order. There is a function key :: T -> K but it is not order-preserving. It is desired to manage a collection of Ts, indexed (for rapid access) both by the T order and the K order. More generally, one might want a collection of T elements indexed by a bunch of orders key1 :: T -> K1, .. keyn :: T -> Kn, and it so happens that here key1 = id. Is that the picture?
I think I agree with gereeter's suggestion that the basis for a solution is just to maintiain in sync a bunch of (Map K1 T, .. Map Kn T). Inserting a key-value pair in a map duplicates neither the key nor the value, allocating only the extra heap required to make a new entry in the right place in the index. Inserting the same value, suitably keyed, in multiple indices should not break sharing (even if one of the keys is the value). It is worth wrapping the structure in an API which ensures that any subsequent modifications to the value are computed once and shared, rather than recomputed for each entry in an index.
Bottom line: it should be possible to maintain multiple maps, ensuring that the values are shared, even though the key-orders are separate.

Resources