I have a question, if we need to pass custom object as a key, we just need to override equals and hashcode methods or even the class should be immutable?
because string is immutable so we prefer it as a key in hashmap, so with that logic i have raised this question ?
It depends on your hashing function.
All data that is being hashed should be immutable. Otherwise you will lose access to your value once you alter one of those fields, until you pass another object which hashes the same way. The odds of finding your value again using any other key object is quite low.
Related
I am trying to understand Hashmap concepts. I understand that it can be useful to find a certain object by hashing an object to find its location in memory.
However, why can't we have a property of an object correspond to its position in memory, so we could refer to that when searching for an object. As for insertion, we could have a counter store the number of objects, so that insertion could be O(1).
Why isn't this feasible?
I understand that it can be useful to find a certain object by hashing an object to find its location in memory.
Hashing does not do that! What it does is to compute a number from the value of the object. Even in the case where Object::hashCode is not overridden, the code is still not the object's address in memory.
One problem with addresses (and why they can't be used) is that an object's address changes when the GC moves it. So each time the GC ran you would need to rebuild all of the hash tables. Identity hashcodes don't have this problem / cost. An object's identity hashcode never changes.
A second problem is that if you used addresses (only) as the hashcodes, you wouldn't be able to handle the (more common) case of hashing based on the value of the key.
Is it possible to insert and get values into/from a HashMap directly with a Hash provided, so I can cache the hashes?
I want to do something like this:
map.insert(key, "value");
let hashed_key = {
let mut hasher = map.hasher().build_hasher();
key.hash(&mut hasher);
hasher.finish()
};
assert_eq!(map.get(key).unwrap(), map.get_by_hash(hashed_key).unwrap());
playground
No.
This is fundamentally impossible at the algorithmic level.
By design, a hash operation is surjective: multiple elements may hash to the same value. Therefore, any HashMap implementation can only use the hash as a hint and must then use a full equality comparison to check that the element found by the hint is the right element (or not).
At best, a get_by_hash method would return an Iterator of all possible elements matching the current hash.
For a degenerate case, consider a hashing algorithm which always returns 4 (obtained by the roll of a fair dice). Which element would you expect it to return?
Work-around
If caching is what you are after, the trick in languages with no HashBuilder is to pre-hash (and cache) the hash inside the key itself.
It requires caching the full key (because of the equality check), but hashing is then a very simple operation (return the cached value).
It does not, however, speed up the equality check, which depending on the value may be quite expensive.
You could adapt the pattern to Rust, although you would lose the advantage of using a HashBuilder.
During my most recent job interview for a software engineer position, I was asked this questions: what are the differences between hashtable and hashmap? I asked the interviewer if he was specific about Java since in Java hashtable is synchronized and hashmap is not (and actually tons of information to compare hashtable vs hashmap in Java after googling so that's not the answer I am looking for) but he said no and wanted to me to explain the the difference of these two in general.
I was really puzzled and shocked (actually still puzzled now) about this question. IMO, hastable or hashmap is simply a matter of terminology. Actually only Java has both terms and in other languages like C++, they don't even have the term hashtable. During the interview, I just explained the principle of hashing and said that hashmap and hashtable should both be implemented based on this principle and I don't know if there is any difference between these two. The interviewer was definitely not convinced and was looking for other answers and of course I was rejected after that round.
So back to the topic, what could possibly be the differences between hashmap and hashtable in general (not specific to Java) if there is any?
In Computer Science there is a difference due to the wording.
A HashTable is some kind of lookup table using key hashes to lookup the corresponding value in a table like data structure. Thats only one kind of a key-value Mapping. There are different implementations as you are probably aware. Different hashes, hash collusion solutions and table growing strategies and more under the hood. It's only interesting if you need to make your own hash table for whatever reason.
A HashMap is some kind of mapping of key-value pairs with a hashed key. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too.
You could simplify and say that a HashTable is the underlying data structure and the HashMap may be utilizing a HashTable.
A Dictionary is yet another abstraction level since it may not use hashes at all - for example with full text binary search lookups or other ways for compares. This is all you can get out of the words without considering certain programming languages.
--
Before thinking too much about it. Can you say - with certainty - that your interviewer had a clue about what he/she was talking about? Did you discuss technical details or did they just listen/ask and sometimes comment? Sometimes interviewers just come up with the most ridicules answers to problems they don't really understand in the first place.
Like you wrote yourself, in general it's just Terminology. Software Developers often use the Terms interchangeable, except maybe those who really have differences like in Java.
The interviewer may have been looking for the insight that...
a hash table is a lower-level concept that doesn't imply or necessarily support any distinction or separation of keys and values (i.e. you can implement a hash set of values using a hash table), while
a hash map must support distinct keys and values, as there's to be a mapping/association from keys to values; the two are distinct, even if in some implementations they're always stored side by side in memory, e.g. members of the same structure / std::pair<>.
Example: a (bad) hash table implementation preventing use as a hash map.
Consider:
template <typename T>
class Hash_Table
{
...
bool insert(const T& t)
{
// work out which bucket t hashes to...
size_t bucket = hash_bytes((void*)&t, sizeof t) % num_buckets_;
// see if t is already stored in the bucket...
if (memcmp((void*)&t, (void*)&buckets_[bucket], sizeof t) == 0)
...
... handle collisions etc. ...
}
...
};
Above, the hard-coded calls to a hash function that treats the value being inserted as a binary blob, and memcmp of the entire t, mean you can't make T say a std::pair<int, std::string> and use the hash table as a hash map from ints to strings. So, it's an example of a hash table that's not usable as a hash map.
You might or might not also consider a hash table that simply doesn't provide any convenience features for use as a hash map not to be a hash map. For example, if the API was designed as if dealing only in values - h.insert(t); h.erase(t); auto i = h.find(t); - but it allowed the caller to specify arbitrary custom comparison and hashing functions that could restrict their operations to only the key part of t, then the hash table could be (ab)used as a functional hash map.
To clarify how this relates to makadev's existing answer, I disagree with:
"A HashTable [uses] key hashes to lookup the corresponding value"; wrong because it assumes a key->value mapping.
"A HashMap [...]. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too."; wrong because the primary mechanism of a hash map is still hashing of the key to a bucket (index) in the table/array: some hash tables/maps may use other data structures (arrays, linked lists, trees...) to store elements that collide at the same bucket, but that's a different issue and not part of the difference between hash tables and hash maps.
Actually HashTable become obsoletes and HasHMap is best approach to use because Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.
Second difference is HashMap extends Map Interface and whether HashSet Dictionary interface.
I Have a key value pair(both key and value are static). Key is string and value is integer. We I get a string key I need to find value. Earlier I used hashmap but it is slow we I have large entry set. Can you tell which is super fast way to map string to integer.
HashMap is a very efficient data structure for this task. It's unlikely that you'd be able to improve on the existing code.
However, this article may provide some insight - Is it possible to map string to int faster than using hashmap?
If I remember correctly even when we pass large string or byte array as a parameter to a method, it only passes the pointer to the heap of the data rather than the full data. So it should not degrade any performance or it should not pile up memory unnecessarily. Just want to confirm if my understanding is correct regarding the above statement?
I know it is better to keep the string or byte array as a private variable in the class and access it in every method it required thus eliminating one additional parameter from the method call.
Thanks
If I remember correctly even when we pass large string or byte array as a parameter to a method, it only passes the pointer to the heap of the data rather than the full data.
Yes, when you pass any reference type argument, just the reference is passed, by value. Note that this is not the same as "pass by reference", which is applicable to parameters of both reference types and value types.
See my articles on reference types and value types and parameter passing for more information.
I know it is better to keep the string or byte array as a private variable in the class and access it in every method it required thus eliminating one additional parameter from the method call.
That entirely depends on the context. Is it logically part of the state of the class or not?