Does time complexity of Hashmap get() and put() operation is O(1) at all time [duplicate]

Does time complexity of Hashmap get() and put() operation is O(1) at all time [duplicate] - hashmap

We are used to saying that HashMap get/put operations are O(1). However it depends on the hash implementation. The default object hash is actually the internal address in the JVM heap. Are we sure it is good enough to claim that the get/put are O(1)?
Available memory is another issue. As I understand from the javadocs, the HashMap load factor should be 0.75. What if we do not have enough memory in JVM and the load factor exceeds the limit?
So, it looks like O(1) is not guaranteed. Does it make sense or am I missing something?

It depends on many things. It's usually O(1), with a decent hash which itself is constant time... but you could have a hash which takes a long time to compute, and if there are multiple items in the hash map which return the same hash code, get will have to iterate over them calling equals on each of them to find a match.
In the worst case, a HashMap has an O(n) lookup due to walking through all entries in the same hash bucket (e.g. if they all have the same hash code). Fortunately, that worst case scenario doesn't come up very often in real life, in my experience. So no, O(1) certainly isn't guaranteed - but it's usually what you should assume when considering which algorithms and data structures to use.
In JDK 8, HashMap has been tweaked so that if keys can be compared for ordering, then any densely-populated bucket is implemented as a tree, so that even if there are lots of entries with the same hash code, the complexity is O(log n). That can cause issues if you have a key type where equality and ordering are different, of course.
And yes, if you don't have enough memory for the hash map, you'll be in trouble... but that's going to be true whatever data structure you use.

It has already been mentioned that hashmaps are O(n/m) in average, if n is the number of items and m is the size. It has also been mentioned that in principle the whole thing could collapse into a singly linked list with O(n) query time. (This all assumes that calculating the hash is constant time).
However what isn't often mentioned is, that with probability at least 1-1/n (so for 1000 items that's a 99.9% chance) the largest bucket won't be filled more than O(logn)! Hence matching the average complexity of binary search trees. (And the constant is good, a tighter bound is (log n)*(m/n) + O(1)).
All that's required for this theoretical bound is that you use a reasonably good hash function (see Wikipedia: Universal Hashing. It can be as simple as a*x>>m). And of course that the person giving you the values to hash doesn't know how you have chosen your random constants.
TL;DR: With Very High Probability the worst case get/put complexity of a hashmap is O(logn).

I'm not sure the default hashcode is the address - I read the OpenJDK source for hashcode generation a while ago, and I remember it being something a bit more complicated. Still not something that guarantees a good distribution, perhaps. However, that is to some extent moot, as few classes you'd use as keys in a hashmap use the default hashcode - they supply their own implementations, which ought to be good.
On top of that, what you may not know (again, this is based in reading source - it's not guaranteed) is that HashMap stirs the hash before using it, to mix entropy from throughout the word into the bottom bits, which is where it's needed for all but the hugest hashmaps. That helps deal with hashes that specifically don't do that themselves, although i can't think of any common cases where you'd see that.
Finally, what happens when the table is overloaded is that it degenerates into a set of parallel linked lists - performance becomes O(n). Specifically, the number of links traversed will on average be half the load factor.

I agree with:
the general amortized complexity of O(1)
a bad hashCode() implementation could result to multiple collisions, which means that in the worst case every object goes to the same bucket, thus O(N) if each bucket is backed by a List.
since Java 8, HashMap dynamically replaces the Nodes (linked list) used in each bucket with TreeNodes (red-black tree when a list gets bigger than 8 elements) resulting to a worst performance of O(logN).
But, this is not the full truth if we want to be 100% precise. The implementation of hashCode() and the type of key Object (immutable/cached or being a Collection) might also affect real time complexity in strict terms.
Let's assume the following three cases:
HashMap<Integer, V>
HashMap<String, V>
HashMap<List<E>, V>
Do they have the same complexity? Well, the amortised complexity of the 1st one is, as expected, O(1). But, for the rest, we also need to compute hashCode() of the lookup element, which means we might have to traverse arrays and lists in our algorithm.
Lets assume that the size of all of the above arrays/lists is k.
Then, HashMap<String, V> and HashMap<List<E>, V> will have O(k) amortised complexity and similarly, O(k + logN) worst case in Java8.
*Note that using a String key is a more complex case, because it is immutable and Java caches the result of hashCode() in a private variable hash, so it's only computed once.
/** Cache the hash code for the string */
private int hash; // Default to 0
But, the above is also having its own worst case, because Java's String.hashCode() implementation is checking if hash == 0 before computing hashCode. But hey, there are non-empty Strings that output a hashcode of zero, such as "f5a5a608", see here, in which case memoization might not be helpful.

HashMap operation is dependent factor of hashCode implementation. For the ideal scenario lets say the good hash implementation which provide unique hash code for every object (No hash collision) then the best, worst and average case scenario would be O(1).
Let's consider a scenario where a bad implementation of hashCode always returns 1 or such hash which has hash collision. In this case the time complexity would be O(n).
Now coming to the second part of the question about memory, then yes memory constraint would be taken care by JVM.

In practice, it is O(1), but this actually is a terrible and mathematically non-sense simplification. The O() notation says how the algorithm behaves when the size of the problem tends to infinity. Hashmap get/put works like an O(1) algorithm for a limited size. The limit is fairly large from the computer memory and from the addressing point of view, but far from infinity.
When one says that hashmap get/put is O(1) it should really say that the time needed for the get/put is more or less constant and does not depend on the number of elements in the hashmap so far as the hashmap can be presented on the actual computing system. If the problem goes beyond that size and we need larger hashmaps then, after a while, certainly the number of the bits describing one element will also increase as we run out of the possible describable different elements. For example, if we used a hashmap to store 32bit numbers and later we increase the problem size so that we will have more than 2^32 bit elements in the hashmap, then the individual elements will be described with more than 32bits.
The number of the bits needed to describe the individual elements is log(N), where N is the maximum number of elements, therefore get and put are really O(log N).
If you compare it with a tree set, which is O(log n) then hash set is O(long(max(n)) and we simply feel that this is O(1), because on a certain implementation max(n) is fixed, does not change (the size of the objects we store measured in bits) and the algorithm calculating the hash code is fast.
Finally, if finding an element in any data structure were O(1) we would create information out of thin air. Having a data structure of n element I can select one element in n different way. With that, I can encode log(n) bit information. If I can encode that in zero bit (that is what O(1) means) then I created an infinitely compressing ZIP algorithm.

In simple word, If each bucket contain only single node then time complexity will be O(1). If bucket contain more than one node them time complexity will be O(linkedList size). which is always efficient than O(n).
hence we can say on an average case time complexity of put(K,V) function :
nodes(n)/buckets(N) = λ (lambda)
Example : 16/16 = 1
Time complexity will be O(1)

Java HashMap time complexity
--------------------------------
get(key) & contains(key) & remove(key) Best case Worst case
HashMap before Java 8, using LinkedList buckets 1 O(n)
HashMap after Java 8, using LinkedList buckets 1 O(n)
HashMap after Java 8, using Binary Tree buckets 1 O(log n)
put(key, value) Best case Worst case
HashMap before Java 8, using LinkedList buckets 1 1
HashMap after Java 8, using LinkedList buckets 1 1
HashMap after Java 8, using Binary Tree buckets 1 O(log n)
Hints:
Before Java 8, HashMap use LinkedList buckets
After Java 8, HashMap will use either LinkedList buckets or Binary Tree buckets according to the bucket size.
if(bucket size > TREEIFY_THRESHOLD[8]):
treeifyBin: The bucket will be a Balanced Binary Red-Black Tree
if(bucket size <= UNTREEIFY_THRESHOLD[6]):
untreeify: The bucket will be LinkedList (plain mode)

Related

Why does a HashMap contain a LinkedList instead of an AVL tree?

The instructor in this video explains that hash map implementations usually contain a linked list to chain values in case of collisions. My question is: Why not use something like an AVL tree (that takes O(log n) for insertions, deletions and lookups), instead of a linked list (that has a worst case lookup of O(n))?
I understand that hash functions should be designed such that collisions would be rare. But why not implement AVL trees anyway to optimize those rare cases?

It depends of the language implementing HashMap. I dont think this is a strict rule.
For example in Java:
What your video says is true up to Java 7.
In Java 8, the implementation of HashMap was changed to make use of red-black trees once the bucket grows beyond a certain point.
If your number of elements in the bucket is less than 8, it uses a singly linked list. Once it grows larger than 8 it becomes a tree. And reverts back to a singly linked list once it shrinks back to 6.
Why not just use a tree all the time? I guess this is a tradeoff between memory footprint vs lookup complexity within the bucket. Keep in mind that most hash functions will yield very few collisions, so maintaining a tree for buckets that have a size of 3 or 4 would be much more expensive for no good reason.
For reference, this is the Java 8 impl of an HashMap (and it actually has a quite good explanation about how the whole thing works, and why they chose 8 and 6, as "TREEIFY" and "UNTREEIFY" threshold) :
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/HashMap.java?av=f
And in Java 7:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/util/HashMap.java?av=f

How can natural numbers be represented to offer constant time addition?

Cirdec's answer to a largely unrelated question made me wonder how best to represent natural numbers with constant-time addition, subtraction by one, and testing for zero.
Why Peano arithmetic isn't good enough:
Suppose we use
data Nat = Z | S Nat
Then we can write
Z + n = n
S m + n = S(m+n)
We can calculate m+n in O(1) time by placing m-r debits (for some constant r), one on each S constructor added onto n. To get O(1) isZero, we need to be sure to have at most p debits per S constructor, for some constant p. This works great if we calculate a + (b + (c+...)), but it falls apart if we calculate ((...+b)+c)+d. The trouble is that the debits stack up on the front end.
One option
The easy way out is to just use catenable lists, such as the ones Okasaki describes, directly. There are two problems:
O(n) space is not really ideal.
It's not entirely clear (at least to me) that the complexity of bootstrapped queues is necessary when we don't care about order the way we would for lists.

As far as I know, Idris (a dependently-typed purely functional language which is very close to Haskell) deals with this in a quite straightforward way. Compiler is aware of Nats and Fins (upper-bounded Nats) and replaces them with machine integer types and operations whenever possible, so the resulting code is pretty effective. However, that's not true for custom types (even isomorphic ones) as well as for compilation stage (there were some code samples using Nats for type checking which resulted in exponential growth in compile-time, I can provide them if needed).
In case of Haskell, I think a similar compiler extension may be implemented. Another possibility is to make TH macros which would transform the code. Of course, both of options aren't easy.

My understanding is that in basic computer programming terminology the underlying problem is you want to concatenate lists in constant time. The lists don't have cheats like forward references, so you can't jump to the end in O(1) time, for example.
You can use rings instead, which you can merge in O(1) time, regardless if a+(b+(c+...)) or ((...+c)+b)+a logic is used. The nodes in the rings don't need to be doubly linked, just a link to the next node.
Subtraction is the removal of any node, O(1), and testing for zero (or one) is trivial. Testing for n > 1 is O(n), however.
If you want to reduce space, then at each operation you can merge the nodes at the insertion or deletion points and weight the remaining ones higher. The more operations you do, the more compact the representation becomes! I think the worst case will still be O(n), however.

We know that there are two "extremal" solutions for efficient addition of natural numbers:
Memory efficient, the standard binary representation of natural numbers that uses O(log n) memory and requires O(log n) time for addition. (See also Chapter "Binary Representations" in the Okasaki's book.)
CPU efficient which use just O(1) time. (See Chapter "Structural Abstraction" in the book.) However, the solution uses O(n) memory as we'd represent natural number n as a list of n copies of ().
I haven't done the actual calculations, but I believe for the O(1) numerical addition we won't need the full power of O(1) FIFO queues, it'd be enough to bootstrap standard list [] (LIFO) in the same way. If you're interested, I could try to elaborate on that.
The problem with the CPU efficient solution is that we need to add some redundancy to the memory representation so that we can spare enough CPU time. In some cases, adding such a redundancy can be accomplished without compromising the memory size (like for O(1) increment/decrement operation). And if we allow arbitrary tree shapes, like in the CPU efficient solution with bootstrapped lists, there are simply too many tree shapes to distinguish them in O(log n) memory.
So the question is: Can we find just the right amount of redundancy so that sub-linear amount of memory is enough and with which we could achieve O(1) addition? I believe the answer is no:
Let's have a representation+algorithm that has O(1) time addition. Let's then have a number of the magnitude of m-bits, which we compute as a sum of 2^k numbers, each of them of the magnitude of (m-k)-bit. To represent each of those summands we need (regardless of the representation) minimum of (m-k) bits of memory, so at the beginning, we start with (at least) (m-k) 2^k bits of memory. Now at each of those 2^k additions, we are allowed to preform a constant amount of operations, so we are able to process (and ideally remove) total of C 2^k bits. Therefore at the end, the lower bound for the number of bits we need to represent the outcome is (m-k-C) 2^k bits. Since k can be chosen arbitrarily, our adversary can set k=m-C-1, which means the total sum will be represented with at least 2^(m-C-1) = 2^m/2^(C+1) ∈ O(2^m) bits. So a natural number n will always need O(n) bits of memory!

hashmap remove complexity

So a lot of sources say the hashmap remove function is O(1), but I don't see how this could be unless a hashmap were backed by a linkedlist because list removals are O(n). Could someone explain?

You can view a Hasmap as an array. Imagine, you want to store objects of all humans on earth somewhere. You could just get an unique number for everyone and use an array with a dimension of 10*10^20.
If someone is born, she/he gets the next free number and is added to the end. If someone dies, her/his number is used and the array entry is set to null.
You can easily see, to add some or to remove someone, you need only constant time. calculate array address, done (if you have random access memory).
What is added by the Hashmap? There are 2 motivations. On the one side, you do not want to have such a big array. If you only want to store 10 people from all over the world, nearly all entries of the array are free. On the other side, not all data you want to store somewhere have an unique number. Sometimes there are multiple times the same number, some numbers do now show overall and sometimes you do not have any number. Therefore, you define a function, which uses the big numbers from the input and reduce them to numbers in a smaller range. This reduction should be in a way, that the resulting number is most likely unique for different inputs.
Example: Lets say you want to store 10 numbers from 1 to 100000000. You could use an array with 100000000 indices. Or you could use an array with 100 indices and the function f(x) = x % 100. If you have the number 1234, f(1234) = 34. Mark 34 as assigned.
Now you could ask, what happens if you have the number 2234? We have a collision then. You need some strategy then to handle this, there are several. Study some literature or ask specific questions for this.
If you want to store a string, you could imagine to use the length or the sum of the ascii value from every characters.
As you see, we can easily store something, and easily access it again. What we have to do? Calculate the hash from the function (constant time for a good function), access the array (constant time), store or remove (constant time).
In real world, a good hash function is not that easy. Try to stick with the included ones in java.
If you want to read more details, the wikipedia article about hash table is a good starting point: http://en.wikipedia.org/wiki/Hash_table

I don't think the remove(key) complexity is O(1). If we have a big hash table with many collisions, then it would be O(n) in worst case. It very rare to get the worst case but we can't neglect the fact that O(1) is not guaranteed.

If your HashMap is backed by a LinkedList buckets array
The worst case of the remove function will be O(n)
If your HashMap is backed by a Balanced Binary Tree buckets array
The worst case of the remove function will be O(log n)
The best case and the average case (amortized complexity) of the remove function is O(1)

Constant-time hash for strings?

Another question on SO brought up the facilities in some languages to hash strings to give them a fast lookup in a table. Two examples of this are dictionary<> in .NET and the {} storage structure in Python. Other languages certainly support such a mechanism. C++ has its map, LISP has an equivalent, as do most other modern languages.
It was contended in the answers to the question that hash algorithms on strings can be conducted in constant timem with one SO member who has 25 years experience in programming claiming that anything can be hashed in constant time. My personal contention is that this is not true, unless your particular application places a boundary on the string length. This means that some constant K would dictate the maximal length of a string.
I am familiar with the Rabin-Karp algorithm which uses a hashing function for its operation, but this algorithm does not dictate a specific hash function to use, and the one the authors suggested is O(m), where m is the length of the hashed string.
I see some other pages such as this one (http://www.cse.yorku.ca/~oz/hash.html) that display some hash algorithms, but it seems that each of them iterates over the entire length of the string to arrive at its value.
From my comparatively limited reading on the subject, it appears that most associative arrays for string types are actually created using a hashing function that operates with a tree of some sort under the hood. This may be an AVL tree or red/black tree that points to the location of the value element in the key/value pair.
Even with this tree structure, if we are to remain on the order of theta(log(n)), with n being the number of elements in the tree, we need to have a constant-time hash algorithm. Otherwise, we have the additive penalty of iterating over the string. Even though theta(m) would be eclipsed by theta(log(n)) for indexes containing many strings, we cannot ignore it if we are in such a domain that the texts we search against will be very large.
I am aware that suffix trees/arrays and Aho-Corasick can bring the search down to theta(m) for a greater expense in memory, but what I am asking specifically if a constant-time hash method exists for strings of arbitrary lengths as was claimed by the other SO member.
Thanks.

A hash function doesn't have to (and can't) return a unique value for every string.
You could use the first 10 characters to initialize a random number generator and then use that to pull out 100 random characters from the string, and hash that. This would be constant time.
You could also just return the constant value 1. Strictly speaking, this is still a hash function, although not a very useful one.

In general, I believe that any complete string hash must use every character of the string and therefore would need to grow as O(n) for n characters. However I think for practical string hashes you can use approximate hashes that can easily be O(1).
Consider a string hash that always uses Min(n, 20) characters to compute a standard hash. Obviously this grows as O(1) with string size. Will it work reliably? It depends on your domain...

You cannot easily achieve a general constant time hashing algorithm for strings without risking severe cases of hash collisions.
For it to be constant time, you will not be able to access every character in the string. As a simple example, suppose we take the first 6 characters. Then comes someone and tries to hash an array of URLs. The has function will see "http:/" for every single string.
Similar scenarios may occur for other characters selections schemes. You could pick characters pseudo-randomly based on the value of the previous character, but you still run the risk of failing spectacularly if the strings for some reason have the "wrong" pattern and many end up with the same hash value.

You can hope for asymptotically less than linear hashing time if you use ropes instead of strings and have sharing that allows you to skip some computations. But obviously a hash function can not separate inputs that it has not read, so I wouldn't take the "everything can be hashed in constant time" too seriously.
Anything is possible in the compromise between the hash function's quality and the amount of computation it takes, and a hash function over long strings must have collisions anyway.
You have to determine if the strings that are likely to occur in your algorithm will collide too often if the hash function only looks at a prefix.

Although I cannot imagine a fixed-time hash function for unlimited length strings, there is really no need for it.
The idea behind using a hash function is to generate a distribution of the hash values that makes it unlikely that many strings would collide - for the domain under consideration. This key would allow direct access into a data store. These two combined result in a constant time lookup - on average.
If ever such collision occurs, the lookup algorithm falls back on a more flexible lookup sub-strategy.

Certainly this is doable, so long as you ensure all your strings are 'interned', before you pass them to something requiring hashing. Interning is the process of inserting the string into a string table, such that all interned strings with the same value are in fact the same object. Then, you can simply hash the (fixed length) pointer to the interned string, instead of hashing the string itself.

You may be interested in the following mathematical result I came up with last year.
Consider the problem of hashing an infinite number of keys—such as the set of all strings of any length—to the set of numbers in {1,2,…,b}. Random hashing proceeds by first picking at random a hash function h in a family of H functions.
I will show that there is always an infinite number of keys that are certain to collide over all H functions, that is, they always have the same hash value for all hash functions.
Pick any hash function h: there is at least one hash value y such that the set A={s:h(s)=y} is infinite, that is, you have infinitely many strings colliding. Pick any other hash function h‘ and hash the keys in the set A. There is at least one hash value y‘ such that the set A‘={s is in A: h‘(s)=y‘} is infinite, that is, there are infinitely many strings colliding on two hash functions. You can repeat this argument any number of times. Repeat it H times. Then you have an infinite set of strings where all strings collide over all of your H hash functions. CQFD.
Further reading:
Sensible hashing of variable-length strings is impossible
http://lemire.me/blog/archives/2009/10/02/sensible-hashing-of-variable-length-strings-is-impossible/

What's Up with O(1)?

I have been noticing some very strange usage of O(1) in discussion of algorithms involving hashing and types of search, often in the context of using a dictionary type provided by the language system, or using dictionary or hash-array types used using array-index notation.
Basically, O(1) means bounded by a constant time and (typically) fixed space. Some pretty fundamental operations are O(1), although using intermediate languages and special VMs tends to distort ones thinking here (e.g., how does one amortize the garbage collector and other dynamic processes over what would otherwise be O(1) activities).
But ignoring amortization of latencies, garbage-collection, and so on, I still don't understand how the leap to assumption that certain techniques that involve some kind of searching can be O(1) except under very special conditions.
Although I have noticed this before, an example just showed up in the Pandincus question, "'Proper’ collection to use to obtain items in O(1) time in C# .NET?".
As I remarked there, the only collection I know of that provides O(1) access as a guaranteed bound is a fixed-bound array with an integer index value. The presumption is that the array is implemented by some mapping to random access memory that uses O(1) operations to locate the cell having that index.
For collections that involve some sort of searching to determine the location of a matching cell for a different kind of index (or for a sparse array with integer index), life is not so easy. In particular, if there are collisons and congestion is possible, access is not exactly O(1). And if the collection is flexible, one must recognize and amortize the cost of expanding the underlying structure (such as a tree or a hash table) for which congestion relief (e.g., high collision incidence or tree imbalance).
I would never have thought to speak of these flexible and dynamic structures as O(1). Yet I see them offered up as O(1) solutions without any identification of the conditions that must be maintained to actually have O(1) access be assured (as well as have that constant be negligibly small).
THE QUESTION: All of this preparation is really for a question. What is the casualness around O(1) and why is it accepted so blindly? Is it recognized that even O(1) can be undesirably large, even though near-constant? Or is O(1) simply the appropriation of a computational-complexity notion to informal use? I'm puzzled.
UPDATE: The Answers and comments point out where I was casual about defining O(1) myself, and I have repaired that. I am still looking for good answers, and some of the comment threads are rather more interesting than their answers, in a few cases.

The problem is that people are really sloppy with terminology. There are 3 important but distinct classes here:
O(1) worst-case
This is simple - all operations take no more than a constant amount of time in the worst case, and therefore in all cases. Accessing an element of an array is O(1) worst-case.
O(1) amortized worst-case
Amortized means that not every operation is O(1) in the worst case, but for any sequence of N operations, the total cost of the sequence is no O(N) in the worst case. This means that even though we can't bound the cost of any single operation by a constant, there will always be enough "quick" operations to make up for the "slow" operations such that the running time of the sequence of operations is linear in the number of operations.
For example, the standard Dynamic Array which doubles its capacity when it fills up requires O(1) amortized time to insert an element at the end, even though some insertions require O(N) time - there are always enough O(1) insertions that inserting N items always takes O(N) time total.
O(1) average-case
This one is the trickiest. There are two possible definitions of average-case: one for randomized algorithms with fixed inputs, and one for deterministic algorithms with randomized inputs.
For randomized algorithms with fixed inputs, we can calculate the average-case running time for any given input by analyzing the algorithm and determining the probability distribution of all possible running times and taking the average over that distribution (depending on the algorithm, this may or may not be possible due to the Halting Problem).
In the other case, we need a probability distribution over the inputs. For example, if we were to measure a sorting algorithm, one such probability distribution would be the distribution that has all N! possible permutations of the input equally likely. Then, the average-case running time is the average running time over all possible inputs, weighted by the probability of each input.
Since the subject of this question is hash tables, which are deterministic, I'm going to focus on the second definition of average-case. Now, we can't always determine the probability distribution of the inputs because, well, we could be hashing just about anything, and those items could be coming from a user typing them in or from a file system. Therefore, when talking about hash tables, most people just assume that the inputs are well-behaved and the hash function is well behaved such that the hash value of any input is essentially randomly distributed uniformly over the range of possible hash values.
Take a moment and let that last point sink in - the O(1) average-case performance for hash tables comes from assuming all hash values are uniformly distributed. If this assumption is violated (which it usually isn't, but it certainly can and does happen), the running time is no longer O(1) on average.
See also Denial of Service by Algorithmic Complexity. In this paper, the authors discuss how they exploited some weaknesses in the default hash functions used by two versions of Perl to generate large numbers of strings with hash collisions. Armed with this list of strings, they generated a denial-of-service attack on some webservers by feeding them these strings that resulted in the worst-case O(N) behavior in the hash tables used by the webservers.

My understanding is that O(1) is not necessarily constant; rather, it is not dependent on the variables under consideration. Thus a hash lookup can be said to be O(1) with respect to the number of elements in the hash, but not with respect to the length of the data being hashed or ratio of elements to buckets in the hash.
The other element of confusion is that big O notation describes limiting behavior. Thus, a function f(N) for small values of N may indeed show great variation, but you would still be correct to say it is O(1) if the limit as N approaches infinity is constant with respect to N.

O(1) means constant time and (typically) fixed space
Just to clarify these are two separate statements. You can have O(1) in time but O(n) in space or whatever.
Is it recognized that even O(1) can be undesirably large, even though near-constant?
O(1) can be impractically HUGE and it's still O(1). It is often neglected that if you know you'll have a very small data set the constant is more important than the complexity, and for reasonably small data sets, it's a balance of the two. An O(n!) algorithm can out-perform a O(1) if the constants and sizes of the data sets are of the appropriate scale.
O() notation is a measure of the complexity - not the time an algorithm will take, or a pure measure of how "good" a given algorithm is for a given purpose.

I can see what you're saying, but I think there are a couple of basic assumptions underlying the claim that look-ups in a Hash table have a complexity of O(1).
The hash function is reasonably designed to avoid a large number of collisions.
The set of keys is pretty much randomly distributed, or at least not purposely designed to make the hash function perform poorly.
The worst case complexity of a Hash table look-up is O(n), but that's extremely unlikely given the above 2 assumptions.

Hashtables is a data structure that supports O(1) search and insertion.
A hashtable usually has a key and value pair, where the key is used to as the parameter to a function (a hash function) which will determine the location of the value in its internal data structure, usually an array.
As insertion and search only depends upon the result of the hash function and not on the size of the hashtable nor the number of elements stored, a hashtable has O(1) insertion and search.
There is one caveat, however. That is, as the hashtable becomes more and more full, there will be hash collisions where the hash function will return an element of an array which is already occupied. This will necesitate a collision resolution in order to find another empty element.
When a hash collision occurs, a search or insertion cannot be performed in O(1) time. However, good collision resolution algorithms can reduce the number of tries to find another suiteable empty spot or increasing the hashtable size can reduce the number of collisions in the first place.
So, in theory, only a hashtable backed by an array with an infinite number of elements and a perfect hash function would be able to achieve O(1) performance, as that is the only way to avoid hash collisions that drive up the number of required operations. Therefore, for any finite-sized array will at one time or another be less than O(1) due to hash collisions.
Let's take a look at an example. Let's use a hashtable to store the following (key, value) pairs:
(Name, Bob)
(Occupation, Student)
(Location, Earth)
We will implement the hashtable back-end with an array of 100 elements.
The key will be used to determine an element of the array to store the (key, value) pair. In order to determine the element, the hash_function will be used:
hash_function("Name") returns 18
hash_function("Occupation") returns 32
hash_function("Location") returns 74.
From the above result, we'll assign the (key, value) pairs into the elements of the array.
array[18] = ("Name", "Bob")
array[32] = ("Occupation", "Student")
array[74] = ("Location", "Earth")
The insertion only requires the use of a hash function, and does not depend on the size of the hashtable nor its elements, so it can be performed in O(1) time.
Similarly, searching for an element uses the hash function.
If we want to look up the key "Name", we'll perform a hash_function("Name") to find out which element in the array the desired value resides.
Also, searching does not depend on the size of the hashtable nor the number of elements stored, therefore an O(1) operation.
All is well. Let's try to add an additional entry of ("Pet", "Dog"). However, there is a problem, as hash_function("Pet") returns 18, which is the same hash for the "Name" key.
Therefore, we'll need to resolve this hash collision. Let's suppose that the hash collision resolving function we used found that the new empty element is 29:
array[29] = ("Pet", "Dog")
Since there was a hash collision in this insertion, our performance was not quite O(1).
This problem will also crop up when we try to search for the "Pet" key, as trying to find the element containing the "Pet" key by performing hash_function("Pet") will always return 18 initially.
Once we look up element 18, we'll find the key "Name" rather than "Pet". When we find this inconsistency, we'll need to resolve the collision in order to retrieve the correct element which contains the actual "Pet" key. Resovling a hash collision is an additional operation which makes the hashtable not perform at O(1) time.

I can't speak to the other discussions you've seen, but there is at least one hashing algorithm that is guaranteed to be O(1).
Cuckoo hashing maintains an invariant so that there is no chaining in the hash table. Insertion is amortized O(1), retrieval is always O(1). I've never seen an implementation of it, it's something that was newly discovered when I was in college. For relatively static data sets, it should be a very good O(1), since it calculates two hash functions, performs two lookups, and immediately knows the answer.
Mind you, this is assuming the hash calcuation is O(1) as well. You could argue that for length-K strings, any hash is minimally O(K). In reality, you can bound K pretty easily, say K < 1000. O(K) ~= O(1) for K < 1000.

There may be a conceptual error as to how you're understanding Big-Oh notation. What it means is that, given an algorithm and an input data set, the upper bound for the algorithm's run time depends on the value of the O-function when the size of the data set tends to infinity.
When one says that an algorithm takes O(n) time, it means that the runtime for an algorithm's worst case depends linearly on the size of the input set.
When an algorithm takes O(1) time, the only thing it means is that, given a function T(f) which calculates the runtime of a function f(n), there exists a natural positive number k such that T(f) < k for any input n. Essentially, it means that the upper bound for the run time of an algorithm is not dependent on its size, and has a fixed, finite limit.
Now, that does not mean in any way that the limit is small, just that it's independent of the size of the input set. So if I artificially define a bound k for the size of a data set, then its complexity will be O(k) == O(1).
For example, searching for an instance of a value on a linked list is an O(n) operation. But if I say that a list has at most 8 elements, then O(n) becomes O(8) becomes O(1).
In this case, it we used a trie data structure as a dictionary (a tree of characters, where the leaf node contains the value for the string used as key), if the key is bounded, then its lookup time can be considered O(1) (If I define a character field as having at most k characters in length, which can be a reasonable assumption for many cases).
For a hash table, as long as you assume that the hashing function is good (randomly distributed) and sufficiently sparse so as to minimize collisions, and rehashing is performed when the data structure is sufficiently dense, you can indeed consider it an O(1) access-time structure.
In conclusion, O(1) time may be overrated for a lot of things. For large data structures the complexity of an adequate hash function may not be trivial, and sufficient corner cases exist where the amount of collisions lead it to behave like an O(n) data structure, and rehashing may become prohibitively expensive. In which case, an O(log(n)) structure like an AVL or a B-tree may be a superior alternative.

In general, I think people use them comparatively without regard to exactness. For example, hash-based data structures are O(1) (average) look up if designed well and you have a good hash. If everything hashes to a single bucket, then it's O(n). Generally, though one uses a good algorithm and the keys are reasonably distributed so it's convenient to talk about it as O(1) without all the qualifications. Likewise with lists, trees, etc. We have in mind certain implementations and it's simply more convenient to talk about them, when discussing generalities, without the qualifications. If, on the other hand, we're discussing specific implementations, then it probably pays to be more precise.

HashTable looks-ups are O(1) with respect to the number of items in the table, because no matter how many items you add to the list the cost of hashing a single item is pretty much the same, and creating the hash will tell you the address of the item.
To answer why this is relevant: the OP asked about why O(1) seemed to be thrown around so casually when in his mind it obviously could not apply in many circumstances. This answer explains that O(1) time really is possible in those circumstances.

Hash table implementations are in practice not "exactly" O(1) in use, if you test one you'll find they average around 1.5 lookups to find a given key across a large dataset
( due to to the fact that collisions DO occur, and upon colliding, a different location must be assigned )
Also, In practice, HashMaps are backed by arrays with an initial size, that is "grown" to double size when it reaches 70% fullness on average, which gives a relatively good addressing space. After 70% fullness collision rates grow faster.
Big O theory states that if you have a O(1) algorithm, or even an O(2) algorithm, the critical factor is the degree of the relation between input-set size and steps to insert/fetch one of them. O(2) is still constant time, so we just approximate it as O(1), because it means more or less the same thing.
In reality, there is only 1 way to have a "perfect hashtable" with O(1), and that requires:
A Global Perfect Hash Key Generator
An Unbounded addressing space.
( Exception case: if you can compute in advance all the permutations of permitted keys for the system, and your target backing store address space is defined to be the size where it can hold all keys that are permitted, then you can have a perfect hash, but its a "domain limited" perfection )
Given a fixed memory allocation, it is not plausible in the least to have this, because it would assume that you have some magical way to pack an infinite amount of data into a fixed amount of space with no loss of data, and that's logistically impossible.
So retrospectively, getting O(1.5) which is still constant time, in a finite amount of memory with even a relatively Naïve hash key generator, I consider pretty damn awesome.
Suffixory note Note I use O(1.5) and O(2) here. These actually don't exist in big-o. These are merely what people whom don't know big-o assume is the rationale.
If something takes 1.5 steps to find a key, or 2 steps to find that key, or 1 steps to find that key, but the number of steps never exceeds 2 and whether it takes 1 step or 2 is completely random, then it is still Big-O of O(1). This is because no matter how many items to you add to the dataset size, It still maintains the <2 steps. If for all tables > 500 keys it takes 2 steps, then you can assume those 2 steps are in fact one-step with 2 parts, ... which is still O(1).
If you can't make this assumption, then your not being Big-O thinking at all, because then you must use the number which represents the number of finite computational steps required to do everything and "one-step" is meaningless to you. Just get into your head that there is NO direct correlation between Big-O and number of execution cycles involved.

O(1) means, exactly, that the algorithm's time complexity is bounded by a fixed value. This doesn't mean it's constant, only that it is bounded regardless of input values. Strictly speaking, many allegedly O(1) time algorithms are not actually O(1) and just go so slowly that they are bounded for all practical input values.

Yes, garbage collection does affect the asymptotic complexity of algorithms running in the garbage collected arena. It is not without cost, but it is very hard to analyze without empirical methods, because the interaction costs are not compositional.
The time spent garbage collecting depends on the algorithm being used. Typically modern garbage collectors toggle modes as memory fills up to keep these costs under control. For instance, a common approach is to use a Cheney style copy collector when memory pressure is low because it pays cost proportional to the size of the live set in exchange for using more space, and to switch to a mark and sweep collector when memory pressure becomes greater, because even though it pays cost proportional to the live set for marking and to the whole heap or dead set for sweeping. By the time you add card-marking and other optimizations, etc. the worst case costs for a practical garbage collector may actually be a fair bit worse, picking up an extra logarithmic factor for some usage patterns.
So, if you allocate a big hash table, even if you access it using O(1) searches for all time during its lifetime, if you do so in a garbage collected environment, occasionally the garbage collector will traverse the entire array, because it is size O(n) and you will pay that cost periodically during collection.
The reason we usually leave it off of the complexity analysis of algorithms is that garbage collection interacts with your algorithm in non-trivial ways. How bad of a cost it is depends a lot on what else you are doing in the same process, so the analysis is not compositional.
Moreover, above and beyond the copy vs. compact vs. mark and sweep issue, the implementation details can drastically affect the resulting complexities:
Incremental garbage collectors that track dirty bits, etc. can all but make those larger re-traversals disappear.
It depends on whether your GC works periodically based on wall-clock time or runs proportional to the number of allocations.
Whether a mark and sweep style algorithm is concurrent or stop-the-world
Whether it marks fresh allocations black if it leaves them white until it drops them into a black container.
Whether your language admits modifications of pointers can let some garbage collectors work in a single pass.
Finally, when discussing an algorithm, we are discussing a straw man. The asymptotics will never fully incorporate all of the variables of your environment. Rarely do you ever implement every detail of a data structure as designed. You borrow a feature here and there, you drop a hash table in because you need fast unordered key access, you use a union-find over disjoint sets with path compression and union by rank to merge memory-regions over there because you can't afford to pay a cost proportional to the size of the regions when you merge them or what have you. These structures are thought primitives and the asymptotics help you when planning overall performance characteristics for the structure 'in-the-large' but knowledge of what the constants are matters too.
You can implement that hash table with perfectly O(1) asymptotic characteristics, just don't use garbage collection; map it into memory from a file and manage it yourself. You probably won't like the constants involved though.

I think when many people throw around the term "O(1)" they implicitly have in mind a "small" constant, whatever "small" means in their context.
You have to take all this big-O analysis with context and common sense. It can be an extremely useful tool or it can be ridiculous, depending on how you use it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string