Assignment expressions without parentheses in set comprehension [duplicate] - python-3.x

I was surprised to discover recently that while dicts are guaranteed to preserve insertion order in Python 3.7+, sets are not:
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> d
{'a': 1, 'b': 2, 'c': 3}
>>> d['d'] = 4
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> s = {'a', 'b', 'c'}
>>> s
{'b', 'a', 'c'}
>>> s.add('d')
>>> s
{'d', 'b', 'a', 'c'}
What is the rationale for this difference? Do the same efficiency improvements that led the Python team to change the dict implementation not apply to sets as well?
I'm not looking for pointers to ordered-set implementations or ways to use dicts as stand-ins for sets. I'm just wondering why the Python team didn't make built-in sets preserve order at the same time they did so for dicts.

Sets and dicts are optimized for different use-cases. The primary use of a set is fast membership testing, which is order agnostic. For dicts, cost of the lookup is the most critical operation, and the key is more likely to be present. With sets, the presence or absence of an element is not known in advance, and so the set implementation needs to optimize for both the found and not-found case. Also, some optimizations for common set operations such as union and intersection make it difficult to retain set ordering without degrading performance.
While both data structures are hash based, it's a common misconception that sets are just implemented as dicts with null values. Even before the compact dict implementation in CPython 3.6, the set and dict implementations already differed significantly, with little code reuse. For example, dicts use randomized probing, but sets use a combination of linear probing and open addressing, to improve cache locality. The initial linear probe (default 9 steps in CPython) will check a series of adjacent key/hash pairs, improving performance by reducing the cost of hash collision handling - consecutive memory access is cheaper than scattered probes.
dictobject.c - master, v3.5.9
setobject.c - master, v3.5.9
issue18771 - changeset to reduce the cost of hash collisions for set objects in Python 3.4.
It would be possible in theory to change CPython's set implementation to be similar to the compact dict, but in practice there are drawbacks, and notable core developers were opposed to making such a change.
Sets remain unordered. (Why? The usage patterns are different. Also, different implementation.)
– Guido van Rossum
Sets use a different algorithm that isn't as amendable to retaining insertion order.
Set-to-set operations lose their flexibility and optimizations if order is required. Set mathematics are defined in terms of unordered sets. In short, set ordering isn't in the immediate future.
– Raymond Hettinger
A detailed discussion about whether to compactify sets for 3.7, and why it was decided against, can be found in the python-dev mailing lists.
In summary, the main points are: different usage patterns (insertion ordering dicts such as **kwargs is useful, less so for sets), space savings for compacting sets are less significant (because there are only key + hash arrays to densify, as opposed to key + hash + value arrays), and the aforementioned linear probing optimization which sets currently use is incompatible with a compact implementation.
I will reproduce Raymond's post below which covers the most important points.
On Sep 14, 2016, at 3:50 PM, Eric Snow wrote:
Then, I'll do same to sets.
Unless I've misunderstood, Raymond was opposed to making a similar
change to set.
That's right. Here are a few thoughts on the subject before people
starting running wild.
For the compact dict, the space savings was a net win with the additional space consumed by the indices and the overallocation for
the key/value/hash arrays being more than offset by the improved
density of key/value/hash arrays. However for sets, the net was much
less favorable because we still need the indices and overallocation
but can only offset the space cost by densifying only two of the three
arrays. In other words, compacting makes more sense when you have
wasted space for keys, values, and hashes. If you lose one of those
three, it stops being compelling.
The use pattern for sets is different from dicts. The former has more hit or miss lookups. The latter tends to have fewer missing key
lookups. Also, some of the optimizations for the set-to-set operations
make it difficult to retain set ordering without impacting
performance.
I pursued alternative path to improve set performance. Instead of compacting (which wasn't much of space win and incurred the cost of an
additional indirection), I added linear probing to reduce the cost of
collisions and improve cache performance. This improvement is
incompatible with the compacting approach I advocated for
dictionaries.
For now, the ordering side-effect on dictionaries is non-guaranteed, so it is premature to start insisting the sets become ordered as well.
The docs already link to a recipe for creating an OrderedSet (
https://code.activestate.com/recipes/576694/ ) but it seems like the
uptake has been nearly zero. Also, now that Eric Snow has given us a
fast OrderedDict, it is easier than ever to build an OrderedSet from
MutableSet and OrderedDict, but again I haven't observed any real
interest because typical set-to-set data analytics don't really need
or care about ordering. Likewise, the primary use of fast membership
testings is order agnostic.
That said, I do think there is room to add alternative set implementations to PyPI. In particular, there are some interesting
special cases for orderable data where set-to-set operations can be
sped-up by comparing entire ranges of keys (see
https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists
for a starting point). IIRC, PyPI already has code for set-like bloom
filters and cuckoo hashing.
I understanding that it is exciting to have a major block of code accepted into the Python core but that shouldn't open to floodgates to
engaging in more major rewrites of other datatypes unless we're sure
that it is warranted.
– Raymond Hettinger
From [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered, Sept 2016.

Discussions
Your question is germane and has already been heavily discussed on python-devs. R. Hettinger shared a list of rationales in that thread. The state of the issue appeared open-ended, shortly after this detailed reply from T. Peters. Some time later (c. 2022), the discussion reignited elsewhere on python-ideas.
In short, the implementation of modern dicts that preserves insertion order is unique and not considered appropriate with sets. In particular, dicts are used everywhere to run Python (e.g. __dict__ in namespaces of objects). A major motivation behind the modern dict was to reduce size, making Python more memory-efficient overall. In contrast, sets are less prevalent than dicts within Python's core and thus dissuade such a refactoring. See also R. Hettinger's talk on the modern dict implementation.
Perspectives
The unordered nature of sets in Python parallels the behavior of mathematical sets. Order is not guaranteed.
The corresponding mathematical concept is unordered and it would be weird to impose
such as order - R. Hettinger
If order of any kind were introduced to sets in Python, then this behavior would comply with a completely separate mathematical structure, namely an ordered set (or Oset). Osets play a separate roll in mathematics, particularly in combinatorics. One practical application of Osets is observed in changing of bells.
Having unordered sets are consistent with a very generic and ubiquitous data structure that unpins most modern math, i.e. Set Theory. I submit, unordered sets in Python are good to have.
See also related posts that expand on this topic:
Converting a list to a set changes element order
Get unique values from a list in python
Does Python have an ordered set

Related

Haskell alternative for Doubly-linked-list coupled with Hash-table pattern

There's a useful pattern in imperative programming, namely, a doubly-linked-list coupled with a hash-table for constant time lookup in the linked list.
One application of this pattern is in LRU cache. The head of the doubly-linked-list will contain the least recently used entry in the cache and the last element in the doubly-linked-list will contain the most recently used entry. The keys in the hash-table are keys of the entries and the values are pointers to nodes in the linked-list corresponding to the key/entry. When an entry is queried in the cache, hash-table will be used to point to its node in the linked-list and then the node will be removed from its current location in the linked-list and be placed at the end of the linked-list making it the most-recently-used entry. For eviction, we simply remove entries from the head of the linked-list as they are the least recently used ones. Both lookup and eviction operations will take constant time.
I can think of implementing this in Haskell using two TreeMaps and I know that the time complexity will be O(log n). But I am a little uncomfortable as the constant factor in the time complexity seems a little high. Specifically, to perform a look-up, first I need to check if the entry exists and save its value, then I need to first delete it from the LRU map and re-insert it with a new key. This means that each lookup will result in a root-to-node traversal three times.
Is there a better way of doing this in Haskell?
As comments indicate, mutable vectors are perfectly acceptable when required. However, I think there's an issue with the way you've stated the question - unless the idea is to duplicate "as closely as possible" (without mutable structures) the imperative code, why bother having 2 treemaps? A single priority search queue (see packages pqueue or PSQueue) would be an appropriate structure whilst maintaining purity. It supports efficiently both priorities (for eviction) and searching (for lookups of your desired cached argument).
On a related note, some structures support eg. Data.Map's alterF, which effectively provides you with a continuation allowing you to "do something else" dependent on the Maybe value at a key, but "remembering" where you are and thus avoiding to pay the full cost to re-traverse the structure to subsequently modify at this key. See also the at lens.

What is the collision resolution mechanism for v8's associative array?

Which mechanism does it use form the following?
https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
Open addressing with quadratic probing (reference: source code).
Note 1: Not everything that acts like an associative array is actually implemented as a hash table under the hood. In particular, small/dense arrays like [3, 1, 4, 1.5] are backed by an actual array (similar to a C array) for fast index-based access.
Note 2: The answer to this question may or may not change over time if/when the team experiments with alternative implementations. For example, open addressing requires relatively low load factors in order to provide quick accesses; it would be interesting to find an implementation that's more memory efficient (without being slower).

Mind blown: RDD.zip() method

I just discovered the RDD.zip() method and I cannot imagine what its contract could possibly be.
I understand what it does, of course. However, it has always been my understanding that
the order of elements in an RDD is a meaningless concept
the number of partitions and their sizes is an implementation detail only available to the user for performance tuning
In other words, an RDD is a (multi)set, not a sequence (and, of course, in, e.g., Python one gets AttributeError: 'set' object has no attribute 'zip')
What is wrong with my understanding above?
What was the rationale behind this method?
Is it legal outside the trivial context like a.map(f).zip(a)?
EDIT 1:
Another crazy method is zipWithIndex(), as well as well as the various zipPartitions() variants.
Note that first() and take() are not crazy because they are just (non-random) samples of the RDD.
collect() is also okay - it just converts a set to a sequence which is perfectly legit.
EDIT 2: The reply says:
when you compute one RDD from another the order of elements in the new RDD may not correspond to that in the old one.
This appears to imply that even the trivial a.map(f).zip(a) is not guaranteed to be equivalent to a.map(x => (f(x),x)). What is the situation when zip() results are reproducible?
It is not true that RDDs are always unordered. An RDD has a guaranteed order if it is the result of a sortBy operation, for example. An RDD is not a set; it can contain duplicates. Partitioning is not opaque to the caller, and can be controlled and queried. Many operations do preserve both partitioning and order, like map. That said I find it a little easy to accidentally violate the assumptions that zip depends on, since they're a little subtle, but it certainly has a purpose.
The mental model I use (and recommend) is that the elements of an RDD are ordered, but when you compute one RDD from another the order of elements in the new RDD may not correspond to that in the old one.
For those who want to be aware of partitions, I'd say that:
The partitions of an RDD have an order.
The elements within a partition have an order.
If you think of "concatenating" the partitions (say laying them "end to end" in order) using the order of elements within them, the overall ordering you end up with corresponds to the order of elements if you ignore partitions.
But again, if you compute one RDD from another, all bets about the order relationships of the two RDDs are off.
Several members of the RDD class (I'm referring to the Scala API) strongly suggest an order concept (as does their documentation):
collect()
first()
partitions
take()
zipWithIndex()
as does Partition.index as well as SparkContext.parallelize() and SparkContext.makeRDD() (which both take a Seq[T]).
In my experience these ways of "observing" order give results that are consistent with each other, and the ones that translate back and forth between RDDs and ordered Scala collections behave as you would expect -- they preserve the overall order of elements. This is why I say that, in practice, RDDs have a meaningful order concept.
Furthermore, while there are obviously many situations where computing an RDD from another must change the order, in my experience order tends to be preserved where it is possible/reasonable to do so. Operations that don't re-partition and don't fundamentally change the set of elements especially tend to preserve order.
But this brings me to your question about "contract", and indeed the documentation has a problem in this regard. I have not seen a single place where an operation's effect on element order is made clear. (The OrderedRDDFunctions class doesn't count, because it refers to an ordering based on the data, which may differ from the raw order of elements within the RDD. Likewise the RangePartitioner class.) I can see how this might lead you to conclude that there is no concept of element order, but the examples I've given above make that model unsatisfying to me.

Converting graph to canonical string

I'm looking for a way of storing graphs as strings. The strings are to be used as keys in a map, so that two topologically identical graphs will map to the same value in the map. Does anybody know of such an algorithm?
The nodes of the tree are labeled with duplicate labels being allowed.
The program is in java and an implementation in that would be neat, but any pointers to possible algorithms are appreciated.
if you have an algorithm that maps general graphs to strings, and so that two graphs map to the same string if and only if they are topologically equivalent, then you have an algorithm for GRAPH AUTOMORPHISM. Graph automorphism has no known polynomial-time algorithms. So you can't have (easily :) a polynomial-time algorithm that calculates the strings as you postulate them, because otherwise you'd have constructed a previously unknown and very efficient algorithm to graph automorphism.
This doesn't mean that it wouldn't be possible to solve the problem for your class of graphs; it just means that for the class of all graphs it's kind of difficult.
You may find the following question relevant...
Using finite automata as keys to a container
Basically, an automaton can be minimised using well-known algorithms from automata-theory textbooks. Hopcrofts is an example. There is precisely one minimal automaton that is equivalent to any given automaton. However, that minimal automaton may be represented in different ways. Constructing a safe canonical form is basically a matter of renumbering nodes and ordering the adjacency table using information that is significant in defining the automaton, and not by information that is specific to the representation.
The basic principle should extend to general graphs. Whether you can minimise your graphs depends on their semantics, but the basic idea of renumbering the nodes and sorting the adjacency list still applies.
Other answers here assume things about your graphs - for example that the nodes have unique labels that can be ordered and which are significant for the semantics of your graphs, that can be used to identify the nodes in an adjacency matrix or list. This simply won't work if you're interested in morphims of unlabelled graphs, for instance. Different ways of numbering the nodes (and thus ordering the adjacency list) will result in different canonical forms for equivalent graphs that just happen to be represented differently.
As for the renumbering etc, an approach is to borrow and adapt principles from automata minimisation algorithms. Basically...
Create a vector of blocks (sets of nodes). Initially, populate this with one block per class of nodes (ie per distinct node annotation). The modification here is that we order these by annotation details (not by representation-specific node IDs).
For each class (annotation) of edges in order, evaluate each block. If each node in the block can follow the current edge-type to reach the same set of next blocks, leave it untouched. Otherwise, split it as necessary to get maximal blocks that achieve this objective. Keep these split blocks clustered together in the vector (preserve the existing ordering, just refine it a bit), and order the split blocks based on a suitable ordering of the next-block sets. For example, use bitvectors as long as the current vector of blocks, with a set bit for each block reachable by following the current edge type. To order the bitvectors, treat them as big integers.
EDIT - I forgot to mention - in the second bullet, as soon as you split a block, you restart with the first block in the vector and first edge annotation. Obviously, a naive implementation will be slow, so take the principle and use it to adapt Hopcrofts minimisation algorithm.
If you end up with blocks that have multiple nodes in them, those nodes are equivalent. Whether that means they can be merged or not depends on your semantics, but the relative ordering of nodes within each such block clearly doesn't matter.
If dealing with graphs that can be minimised (e.g. automaton digraphs) I suspect it's best to minimise first, though I still haven't got around to implementing this myself.
The key thing is, of course, ensuring that your renumbering is sensitive only to the significant details of the graph - its structure and annotations - and not the things that are only there so that you can construct a representation such as node IDs/addresses etc.
Once you have the blocks ordered, deriving a canonical form should be easy.
gSpan introduced the 'minimum DFS code' which encodes graphs such that if two graphs have the same code, they must be isomorphic. gSpan has implementations in C++ and Java.
A common way to do this is using Adjacency lists
Beside an Adjacency list, there are adjacency matrices. Which one you choose should depend on which you use to implement your Graph class (adjacency lists are usually the better choice, but they both have strengths and weaknesses). If you have a totally different implementation of Graph, consider using one of these, as it makes many graph algorithms very easy to implement.
One other option is, if possible, overriding hashCode() and equals() on the Graph class and use the actual graph object as the key rather than converting to a string.
E: overriding the hashCode() and equals() is the route I would take if some vertices are not uniquely labeled. As noted in the comments, this can be expensive, but I think it would depend on the implementation of the Graph class.
If equals() is too expensive, then you should use an adjacency list or matrix, but don't just use the node names. You have to carefully specify exactly what it is that identifies individual graphs and vertices (and therefore what would make them equal), and then make your string representation of the adjacency list use those properties instead of the node names. I'd suggest you write this specification of your graph equals operation down.

What's Up with O(1)?

I have been noticing some very strange usage of O(1) in discussion of algorithms involving hashing and types of search, often in the context of using a dictionary type provided by the language system, or using dictionary or hash-array types used using array-index notation.
Basically, O(1) means bounded by a constant time and (typically) fixed space. Some pretty fundamental operations are O(1), although using intermediate languages and special VMs tends to distort ones thinking here (e.g., how does one amortize the garbage collector and other dynamic processes over what would otherwise be O(1) activities).
But ignoring amortization of latencies, garbage-collection, and so on, I still don't understand how the leap to assumption that certain techniques that involve some kind of searching can be O(1) except under very special conditions.
Although I have noticed this before, an example just showed up in the Pandincus question, "'Proper’ collection to use to obtain items in O(1) time in C# .NET?".
As I remarked there, the only collection I know of that provides O(1) access as a guaranteed bound is a fixed-bound array with an integer index value. The presumption is that the array is implemented by some mapping to random access memory that uses O(1) operations to locate the cell having that index.
For collections that involve some sort of searching to determine the location of a matching cell for a different kind of index (or for a sparse array with integer index), life is not so easy. In particular, if there are collisons and congestion is possible, access is not exactly O(1). And if the collection is flexible, one must recognize and amortize the cost of expanding the underlying structure (such as a tree or a hash table) for which congestion relief (e.g., high collision incidence or tree imbalance).
I would never have thought to speak of these flexible and dynamic structures as O(1). Yet I see them offered up as O(1) solutions without any identification of the conditions that must be maintained to actually have O(1) access be assured (as well as have that constant be negligibly small).
THE QUESTION: All of this preparation is really for a question. What is the casualness around O(1) and why is it accepted so blindly? Is it recognized that even O(1) can be undesirably large, even though near-constant? Or is O(1) simply the appropriation of a computational-complexity notion to informal use? I'm puzzled.
UPDATE: The Answers and comments point out where I was casual about defining O(1) myself, and I have repaired that. I am still looking for good answers, and some of the comment threads are rather more interesting than their answers, in a few cases.
The problem is that people are really sloppy with terminology. There are 3 important but distinct classes here:
O(1) worst-case
This is simple - all operations take no more than a constant amount of time in the worst case, and therefore in all cases. Accessing an element of an array is O(1) worst-case.
O(1) amortized worst-case
Amortized means that not every operation is O(1) in the worst case, but for any sequence of N operations, the total cost of the sequence is no O(N) in the worst case. This means that even though we can't bound the cost of any single operation by a constant, there will always be enough "quick" operations to make up for the "slow" operations such that the running time of the sequence of operations is linear in the number of operations.
For example, the standard Dynamic Array which doubles its capacity when it fills up requires O(1) amortized time to insert an element at the end, even though some insertions require O(N) time - there are always enough O(1) insertions that inserting N items always takes O(N) time total.
O(1) average-case
This one is the trickiest. There are two possible definitions of average-case: one for randomized algorithms with fixed inputs, and one for deterministic algorithms with randomized inputs.
For randomized algorithms with fixed inputs, we can calculate the average-case running time for any given input by analyzing the algorithm and determining the probability distribution of all possible running times and taking the average over that distribution (depending on the algorithm, this may or may not be possible due to the Halting Problem).
In the other case, we need a probability distribution over the inputs. For example, if we were to measure a sorting algorithm, one such probability distribution would be the distribution that has all N! possible permutations of the input equally likely. Then, the average-case running time is the average running time over all possible inputs, weighted by the probability of each input.
Since the subject of this question is hash tables, which are deterministic, I'm going to focus on the second definition of average-case. Now, we can't always determine the probability distribution of the inputs because, well, we could be hashing just about anything, and those items could be coming from a user typing them in or from a file system. Therefore, when talking about hash tables, most people just assume that the inputs are well-behaved and the hash function is well behaved such that the hash value of any input is essentially randomly distributed uniformly over the range of possible hash values.
Take a moment and let that last point sink in - the O(1) average-case performance for hash tables comes from assuming all hash values are uniformly distributed. If this assumption is violated (which it usually isn't, but it certainly can and does happen), the running time is no longer O(1) on average.
See also Denial of Service by Algorithmic Complexity. In this paper, the authors discuss how they exploited some weaknesses in the default hash functions used by two versions of Perl to generate large numbers of strings with hash collisions. Armed with this list of strings, they generated a denial-of-service attack on some webservers by feeding them these strings that resulted in the worst-case O(N) behavior in the hash tables used by the webservers.
My understanding is that O(1) is not necessarily constant; rather, it is not dependent on the variables under consideration. Thus a hash lookup can be said to be O(1) with respect to the number of elements in the hash, but not with respect to the length of the data being hashed or ratio of elements to buckets in the hash.
The other element of confusion is that big O notation describes limiting behavior. Thus, a function f(N) for small values of N may indeed show great variation, but you would still be correct to say it is O(1) if the limit as N approaches infinity is constant with respect to N.
O(1) means constant time and (typically) fixed space
Just to clarify these are two separate statements. You can have O(1) in time but O(n) in space or whatever.
Is it recognized that even O(1) can be undesirably large, even though near-constant?
O(1) can be impractically HUGE and it's still O(1). It is often neglected that if you know you'll have a very small data set the constant is more important than the complexity, and for reasonably small data sets, it's a balance of the two. An O(n!) algorithm can out-perform a O(1) if the constants and sizes of the data sets are of the appropriate scale.
O() notation is a measure of the complexity - not the time an algorithm will take, or a pure measure of how "good" a given algorithm is for a given purpose.
I can see what you're saying, but I think there are a couple of basic assumptions underlying the claim that look-ups in a Hash table have a complexity of O(1).
The hash function is reasonably designed to avoid a large number of collisions.
The set of keys is pretty much randomly distributed, or at least not purposely designed to make the hash function perform poorly.
The worst case complexity of a Hash table look-up is O(n), but that's extremely unlikely given the above 2 assumptions.
Hashtables is a data structure that supports O(1) search and insertion.
A hashtable usually has a key and value pair, where the key is used to as the parameter to a function (a hash function) which will determine the location of the value in its internal data structure, usually an array.
As insertion and search only depends upon the result of the hash function and not on the size of the hashtable nor the number of elements stored, a hashtable has O(1) insertion and search.
There is one caveat, however. That is, as the hashtable becomes more and more full, there will be hash collisions where the hash function will return an element of an array which is already occupied. This will necesitate a collision resolution in order to find another empty element.
When a hash collision occurs, a search or insertion cannot be performed in O(1) time. However, good collision resolution algorithms can reduce the number of tries to find another suiteable empty spot or increasing the hashtable size can reduce the number of collisions in the first place.
So, in theory, only a hashtable backed by an array with an infinite number of elements and a perfect hash function would be able to achieve O(1) performance, as that is the only way to avoid hash collisions that drive up the number of required operations. Therefore, for any finite-sized array will at one time or another be less than O(1) due to hash collisions.
Let's take a look at an example. Let's use a hashtable to store the following (key, value) pairs:
(Name, Bob)
(Occupation, Student)
(Location, Earth)
We will implement the hashtable back-end with an array of 100 elements.
The key will be used to determine an element of the array to store the (key, value) pair. In order to determine the element, the hash_function will be used:
hash_function("Name") returns 18
hash_function("Occupation") returns 32
hash_function("Location") returns 74.
From the above result, we'll assign the (key, value) pairs into the elements of the array.
array[18] = ("Name", "Bob")
array[32] = ("Occupation", "Student")
array[74] = ("Location", "Earth")
The insertion only requires the use of a hash function, and does not depend on the size of the hashtable nor its elements, so it can be performed in O(1) time.
Similarly, searching for an element uses the hash function.
If we want to look up the key "Name", we'll perform a hash_function("Name") to find out which element in the array the desired value resides.
Also, searching does not depend on the size of the hashtable nor the number of elements stored, therefore an O(1) operation.
All is well. Let's try to add an additional entry of ("Pet", "Dog"). However, there is a problem, as hash_function("Pet") returns 18, which is the same hash for the "Name" key.
Therefore, we'll need to resolve this hash collision. Let's suppose that the hash collision resolving function we used found that the new empty element is 29:
array[29] = ("Pet", "Dog")
Since there was a hash collision in this insertion, our performance was not quite O(1).
This problem will also crop up when we try to search for the "Pet" key, as trying to find the element containing the "Pet" key by performing hash_function("Pet") will always return 18 initially.
Once we look up element 18, we'll find the key "Name" rather than "Pet". When we find this inconsistency, we'll need to resolve the collision in order to retrieve the correct element which contains the actual "Pet" key. Resovling a hash collision is an additional operation which makes the hashtable not perform at O(1) time.
I can't speak to the other discussions you've seen, but there is at least one hashing algorithm that is guaranteed to be O(1).
Cuckoo hashing maintains an invariant so that there is no chaining in the hash table. Insertion is amortized O(1), retrieval is always O(1). I've never seen an implementation of it, it's something that was newly discovered when I was in college. For relatively static data sets, it should be a very good O(1), since it calculates two hash functions, performs two lookups, and immediately knows the answer.
Mind you, this is assuming the hash calcuation is O(1) as well. You could argue that for length-K strings, any hash is minimally O(K). In reality, you can bound K pretty easily, say K < 1000. O(K) ~= O(1) for K < 1000.
There may be a conceptual error as to how you're understanding Big-Oh notation. What it means is that, given an algorithm and an input data set, the upper bound for the algorithm's run time depends on the value of the O-function when the size of the data set tends to infinity.
When one says that an algorithm takes O(n) time, it means that the runtime for an algorithm's worst case depends linearly on the size of the input set.
When an algorithm takes O(1) time, the only thing it means is that, given a function T(f) which calculates the runtime of a function f(n), there exists a natural positive number k such that T(f) < k for any input n. Essentially, it means that the upper bound for the run time of an algorithm is not dependent on its size, and has a fixed, finite limit.
Now, that does not mean in any way that the limit is small, just that it's independent of the size of the input set. So if I artificially define a bound k for the size of a data set, then its complexity will be O(k) == O(1).
For example, searching for an instance of a value on a linked list is an O(n) operation. But if I say that a list has at most 8 elements, then O(n) becomes O(8) becomes O(1).
In this case, it we used a trie data structure as a dictionary (a tree of characters, where the leaf node contains the value for the string used as key), if the key is bounded, then its lookup time can be considered O(1) (If I define a character field as having at most k characters in length, which can be a reasonable assumption for many cases).
For a hash table, as long as you assume that the hashing function is good (randomly distributed) and sufficiently sparse so as to minimize collisions, and rehashing is performed when the data structure is sufficiently dense, you can indeed consider it an O(1) access-time structure.
In conclusion, O(1) time may be overrated for a lot of things. For large data structures the complexity of an adequate hash function may not be trivial, and sufficient corner cases exist where the amount of collisions lead it to behave like an O(n) data structure, and rehashing may become prohibitively expensive. In which case, an O(log(n)) structure like an AVL or a B-tree may be a superior alternative.
In general, I think people use them comparatively without regard to exactness. For example, hash-based data structures are O(1) (average) look up if designed well and you have a good hash. If everything hashes to a single bucket, then it's O(n). Generally, though one uses a good algorithm and the keys are reasonably distributed so it's convenient to talk about it as O(1) without all the qualifications. Likewise with lists, trees, etc. We have in mind certain implementations and it's simply more convenient to talk about them, when discussing generalities, without the qualifications. If, on the other hand, we're discussing specific implementations, then it probably pays to be more precise.
HashTable looks-ups are O(1) with respect to the number of items in the table, because no matter how many items you add to the list the cost of hashing a single item is pretty much the same, and creating the hash will tell you the address of the item.
To answer why this is relevant: the OP asked about why O(1) seemed to be thrown around so casually when in his mind it obviously could not apply in many circumstances. This answer explains that O(1) time really is possible in those circumstances.
Hash table implementations are in practice not "exactly" O(1) in use, if you test one you'll find they average around 1.5 lookups to find a given key across a large dataset
( due to to the fact that collisions DO occur, and upon colliding, a different location must be assigned )
Also, In practice, HashMaps are backed by arrays with an initial size, that is "grown" to double size when it reaches 70% fullness on average, which gives a relatively good addressing space. After 70% fullness collision rates grow faster.
Big O theory states that if you have a O(1) algorithm, or even an O(2) algorithm, the critical factor is the degree of the relation between input-set size and steps to insert/fetch one of them. O(2) is still constant time, so we just approximate it as O(1), because it means more or less the same thing.
In reality, there is only 1 way to have a "perfect hashtable" with O(1), and that requires:
A Global Perfect Hash Key Generator
An Unbounded addressing space.
( Exception case: if you can compute in advance all the permutations of permitted keys for the system, and your target backing store address space is defined to be the size where it can hold all keys that are permitted, then you can have a perfect hash, but its a "domain limited" perfection )
Given a fixed memory allocation, it is not plausible in the least to have this, because it would assume that you have some magical way to pack an infinite amount of data into a fixed amount of space with no loss of data, and that's logistically impossible.
So retrospectively, getting O(1.5) which is still constant time, in a finite amount of memory with even a relatively Naïve hash key generator, I consider pretty damn awesome.
Suffixory note Note I use O(1.5) and O(2) here. These actually don't exist in big-o. These are merely what people whom don't know big-o assume is the rationale.
If something takes 1.5 steps to find a key, or 2 steps to find that key, or 1 steps to find that key, but the number of steps never exceeds 2 and whether it takes 1 step or 2 is completely random, then it is still Big-O of O(1). This is because no matter how many items to you add to the dataset size, It still maintains the <2 steps. If for all tables > 500 keys it takes 2 steps, then you can assume those 2 steps are in fact one-step with 2 parts, ... which is still O(1).
If you can't make this assumption, then your not being Big-O thinking at all, because then you must use the number which represents the number of finite computational steps required to do everything and "one-step" is meaningless to you. Just get into your head that there is NO direct correlation between Big-O and number of execution cycles involved.
O(1) means, exactly, that the algorithm's time complexity is bounded by a fixed value. This doesn't mean it's constant, only that it is bounded regardless of input values. Strictly speaking, many allegedly O(1) time algorithms are not actually O(1) and just go so slowly that they are bounded for all practical input values.
Yes, garbage collection does affect the asymptotic complexity of algorithms running in the garbage collected arena. It is not without cost, but it is very hard to analyze without empirical methods, because the interaction costs are not compositional.
The time spent garbage collecting depends on the algorithm being used. Typically modern garbage collectors toggle modes as memory fills up to keep these costs under control. For instance, a common approach is to use a Cheney style copy collector when memory pressure is low because it pays cost proportional to the size of the live set in exchange for using more space, and to switch to a mark and sweep collector when memory pressure becomes greater, because even though it pays cost proportional to the live set for marking and to the whole heap or dead set for sweeping. By the time you add card-marking and other optimizations, etc. the worst case costs for a practical garbage collector may actually be a fair bit worse, picking up an extra logarithmic factor for some usage patterns.
So, if you allocate a big hash table, even if you access it using O(1) searches for all time during its lifetime, if you do so in a garbage collected environment, occasionally the garbage collector will traverse the entire array, because it is size O(n) and you will pay that cost periodically during collection.
The reason we usually leave it off of the complexity analysis of algorithms is that garbage collection interacts with your algorithm in non-trivial ways. How bad of a cost it is depends a lot on what else you are doing in the same process, so the analysis is not compositional.
Moreover, above and beyond the copy vs. compact vs. mark and sweep issue, the implementation details can drastically affect the resulting complexities:
Incremental garbage collectors that track dirty bits, etc. can all but make those larger re-traversals disappear.
It depends on whether your GC works periodically based on wall-clock time or runs proportional to the number of allocations.
Whether a mark and sweep style algorithm is concurrent or stop-the-world
Whether it marks fresh allocations black if it leaves them white until it drops them into a black container.
Whether your language admits modifications of pointers can let some garbage collectors work in a single pass.
Finally, when discussing an algorithm, we are discussing a straw man. The asymptotics will never fully incorporate all of the variables of your environment. Rarely do you ever implement every detail of a data structure as designed. You borrow a feature here and there, you drop a hash table in because you need fast unordered key access, you use a union-find over disjoint sets with path compression and union by rank to merge memory-regions over there because you can't afford to pay a cost proportional to the size of the regions when you merge them or what have you. These structures are thought primitives and the asymptotics help you when planning overall performance characteristics for the structure 'in-the-large' but knowledge of what the constants are matters too.
You can implement that hash table with perfectly O(1) asymptotic characteristics, just don't use garbage collection; map it into memory from a file and manage it yourself. You probably won't like the constants involved though.
I think when many people throw around the term "O(1)" they implicitly have in mind a "small" constant, whatever "small" means in their context.
You have to take all this big-O analysis with context and common sense. It can be an extremely useful tool or it can be ridiculous, depending on how you use it.

Resources