Interaction between __hash__ and __eq__ in Python

Interaction between __hash__ and __eq__ in Python - python-3.x

I wrote this simple code and I was trying to understand what is going on exactly. I created to equal objects and put only one of them in a dictionary.
Then, using the second object as a key, I try to print the name attribute of its value.
Thanks to my hash function, the dictionary returns the value of the hash corresponding to the key I inserted, which is the same for obj1 and obj2.
Here is my question: does my hash function check that the two objects are indeed equal or that is it a case of collision?
I hope the question is clear.
class Test:
def __init__(self, name):
self.name = name
def __eq__(self, other):
return (isinstance(other, type(self)) and self.name == other.name)
def __hash__(self):
return hash(self.name)
obj1 = Test('abc')
obj2 = Test('abc')
d = {}
d[obj1] = obj1
print(d[obj2].name)

You can easily figure this out by testing a few combinations. Consider these two types:
class AlwaysEqualConstantHash:
def __eq__(self, other):
print('AlwaysEqualConstantHash eq')
return True
def __hash__(self):
print('AlwaysEqualConstantHash hash')
return 4
class NeverEqualConstantHash:
def __eq__(self, other):
print('NeverEqualConstantHash eq')
return False
def __hash__(self):
print('NeverEqualConstantHash hash')
return 4
Now let’s put this inside a dictionary and see what happens:
>>> d = {}
>>> d[AlwaysEqualConstantHash()] = 'a'
AlwaysEqualConstantHash hash
>>> d[AlwaysEqualConstantHash()]
AlwaysEqualConstantHash hash
AlwaysEqualConstantHash eq
'a'
>>> d[AlwaysEqualConstantHash()] = 'b'
AlwaysEqualConstantHash hash
AlwaysEqualConstantHash eq
>>> d
{<__main__.AlwaysEqualConstantHash object at 0x00000083E8174A90>: 'b'}
As you can see, the hash is used all the time to address the element in the dictionary. And as soon as there is an element with the same hash inside the dictionary, the equality comparison is also made to figure whether the existing element is equal to the new one. So since all our new AlwaysEqualConstantHash objects are equal to another, they all can be used as the same key in the dictionary.
>>> d = {}
>>> d[NeverEqualConstantHash()] = 'a'
NeverEqualConstantHash hash
>>> d[NeverEqualConstantHash()]
NeverEqualConstantHash hash
NeverEqualConstantHash eq
Traceback (most recent call last):
File "<pyshell#56>", line 1, in <module>
d[NeverEqualConstantHash()]
KeyError: <__main__.NeverEqualConstantHash object at 0x00000083E8186BA8>
>>> d[NeverEqualConstantHash()] = 'b'
NeverEqualConstantHash hash
NeverEqualConstantHash eq
>>> d
{<__main__.NeverEqualConstantHash object at 0x00000083E8186F60>: 'a', <__main__.NeverEqualConstantHash object at 0x00000083E8186FD0>: 'b'}
For the NeverEqualConstantHash this is very different. The hash is also used all the time but since a new object is never equal to another, we cannot retrieve the existing objects that way.
>>> x = NeverEqualConstantHash()
>>> d[x] = 'foo'
NeverEqualConstantHash hash
NeverEqualConstantHash eq
NeverEqualConstantHash eq
>>> d[x]
NeverEqualConstantHash hash
NeverEqualConstantHash eq
NeverEqualConstantHash eq
'foo'
If we use the exact same key though, we can still retrieve the element since it won’t need to compare to itself using __eq__. We also see how the __eq__ is being called for every existing element with the same hash in order to check whether this new object is equal or not to another.
So yeah, the hash is being used to quickly sort the element into the dictionary. And the hash must be equal for elements that are considered equal. Only for hash collisions with existing elements the __eq__ is being used to make sure that both objects refer to the same element.

Related

Overriding the `[]` operator in a dictionary of dictionaries

I am trying to implement a class which provides a dictionary with a default value:
from copy import deepcopy
class Dict:
def __init__(self, default) -> None:
self.default = default
self.values = {}
def __getitem__(self, key):
return self.values[key] if key in self.values else deepcopy(self.default)
def __setitem__(self, key, value):
self.values[key] = value
It works as expected when the default value is "plain" (42 in the example below):
KEY = 'k'
d = Dict(42)
print(d[KEY]) # prints 42
d[KEY] = 53
print(d[KEY]) # prints 53
But it doesn't work as expected when the default value is by itself a Dict object:
KEY1 = 'k1'
KEY2 = 'k2'
d = Dict(Dict(42))
print(d[KEY1][KEY2]) # prints 42
d[KEY1][KEY2] = 53
print(d[KEY1][KEY2]) # prints 42
I have tried to debug that by adding various printouts within the class functions, but I haven't been able to figure it out.
What exactly am I doing wrong here?

The immediate problem is in your __getitem__ method:
def __getitem__(self, key):
return self.values[key] if key in self.values else deepcopy(self.default)
Because you're only returning a value here, but not actually setting it, the returned value isn't useful. If you request a key that doesn't exist, the method is equivalent to:
def __getitem__(self, key):
return deepcopy(self.default)
So when you write:
d[KEY1][KEY2] = 53
You're successfully setting a value for KEY2, but only in the dictionary returned by __getitem__. You probably want to use the dictionary setdefault method, which will set the key in self.values if it doesn't exist (in addition to returning it):
def __getitem__(self, key):
return self.values.setdefault(key, deepcopy(self.default))
With this implementation:
>>> KEY1 = 'k1'
>>> KEY2 = 'k2'
>>> d = Dict(Dict(42))
>>> print(d[KEY1][KEY2])
42
>>> d[KEY1][KEY2] = 53
>>> print(d[KEY1][KEY2])
53
But as I mentioned in my comment, a better solution is just to use the existing defaultdict implementation:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: defaultdict(lambda: 42))
>>> d[KEY1][KEY2]
42
>>> d[KEY1][KEY2]=53
>>> d[KEY1][KEY2]
53
(The difference between defaultdict and the class you implemented is that the default must be a callable. Here's I've used lambda expressions, but you could also use actual functions, classes, etc).

Since you are using deepcopy so it creates a copy without reference.
You have to return the object without deepcopy.
def __getitem__(self, key):
return self.values[key] if key in self.values else self.default
Now it should work as expected.

How to add data with same key in dictionary [duplicate]

I have a text file which contains duplicate car registration numbers with different values, like so:
EDF768, Bill Meyer, 2456, Vet_Parking
TY5678, Jane Miller, 8987, AgHort_Parking
GEF123, Jill Black, 3456, Creche_Parking
ABC234, Fred Greenside, 2345, AgHort_Parking
GH7682, Clara Hill, 7689, AgHort_Parking
JU9807, Jacky Blair, 7867, Vet_Parking
KLOI98, Martha Miller, 4563, Vet_Parking
ADF645, Cloe Freckle, 6789, Vet_Parking
DF7800, Jacko Frizzle, 4532, Creche_Parking
WER546, Olga Grey, 9898, Creche_Parking
HUY768, Wilbur Matty, 8912, Creche_Parking
EDF768, Jenny Meyer, 9987, Vet_Parking
TY5678, Jo King, 8987, AgHort_Parking
JU9807, Mike Green, 3212, Vet_Parking
I want to create a dictionary from this data, which uses the registration numbers (first column) as keys and the data from the rest of the line for values.
I wrote this code:
data_dict = {}
data_list = []
def createDictionaryModified(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part3.txt"
filename = path + "//" + basename
file = open(filename)
contents = file.read()
print(contents,"\n")
data_list = [lines.split(",") for lines in contents.split("\n")]
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
data_dict[regNumber] = details
print(data_dict,"\n")
print(data_dict.items(),"\n")
print(data_dict.values())
The problem is that the data file contains duplicate values for the registration numbers. When I try to store them in the same dictionary with data_dict[regNumber] = details, the old value is overwritten.
How do I make a dictionary with duplicate keys?
Sometimes people want to "combine" or "merge" multiple existing dictionaries by just putting all the items into a single dict, and are surprised or annoyed that duplicate keys are overwritten. See the related question How to merge dicts, collecting values from matching keys? for dealing with this problem.

Python dictionaries don't support duplicate keys. One way around is to store lists or sets inside the dictionary.
One easy way to achieve this is by using defaultdict:
from collections import defaultdict
data_dict = defaultdict(list)
All you have to do is replace
data_dict[regNumber] = details
with
data_dict[regNumber].append(details)
and you'll get a dictionary of lists.

You can change the behavior of the built in types in Python. For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically:
class Dictlist(dict):
def __setitem__(self, key, value):
try:
self[key]
except KeyError:
super(Dictlist, self).__setitem__(key, [])
self[key].append(value)
Output example:
>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}

Rather than using a defaultdict or messing around with membership tests or manual exception handling, use the setdefault method to add new empty lists to the dictionary when they're needed:
results = {} # use a normal dictionary for our output
for k, v in some_data: # the keys may be duplicates
results.setdefault(k, []).append(v) # magic happens here!
setdefault checks to see if the first argument (the key) is already in the dictionary. If doesn't find anything, it assigns the second argument (the default value, an empty list in this case) as a new value for the key. If the key does exist, nothing special is done (the default goes unused). In either case though, the value (whether old or new) gets returned, so we can unconditionally call append on it (knowing it should always be a list).

You can't have a dict with duplicate keys for definition!
Instead you can use a single key and, as the value, a list of elements that had that key.
So you can follow these steps:
See if the current element's key (of your initial set) is in the final dict. If it is, go to step 3
Update dict with key
Append the new value to the dict[key] list
Repeat [1-3]

If you want to have lists only when they are necessary, and values in any other cases, then you can do this:
class DictList(dict):
def __setitem__(self, key, value):
try:
# Assumes there is a list on the key
self[key].append(value)
except KeyError: # If it fails, because there is no key
super(DictList, self).__setitem__(key, value)
except AttributeError: # If it fails because it is not a list
super(DictList, self).__setitem__(key, [self[key], value])
You can then do the following:
dl = DictList()
dl['a'] = 1
dl['b'] = 2
dl['b'] = 3
Which will store the following {'a': 1, 'b': [2, 3]}.
I tend to use this implementation when I want to have reverse/inverse dictionaries, in which case I simply do:
my_dict = {1: 'a', 2: 'b', 3: 'b'}
rev = DictList()
for k, v in my_dict.items():
rev_med[v] = k
Which will generate the same output as above: {'a': 1, 'b': [2, 3]}.
CAVEAT: This implementation relies on the non-existence of the append method (in the values you are storing). This might produce unexpected results if the values you are storing are lists. For example,
dl = DictList()
dl['a'] = 1
dl['b'] = [2]
dl['b'] = 3
would produce the same result as before {'a': 1, 'b': [2, 3]}, but one might expected the following: {'a': 1, 'b': [[2], 3]}.

You can refer to the following article:
http://www.wellho.net/mouth/3934_Multiple-identical-keys-in-a-Python-dict-yes-you-can-.html
In a dict, if a key is an object, there are no duplicate problems.
For example:
class p(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def __str__(self):
return self.name
d = {p('k'): 1, p('k'): 2}

You can't have duplicated keys in a dictionary. Use a dict of lists:
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
if not data_dict.has_key(regNumber):
data_dict[regNumber] = [details]
else:
data_dict[regNumber].append(details)

It's pertty old question but maybe my solution help someone.
by overriding __hash__ magic method, you can save same objects in dict.
Example:
from random import choices
class DictStr(str):
"""
This class behave exacly like str class but
can be duplicated in dict
"""
def __new__(cls, value='', custom_id='', id_length=64):
# If you want know why I use __new__ instead of __init__
# SEE: https://stackoverflow.com/a/2673863/9917276
obj = str.__new__(cls, value)
if custom_id:
obj.id = custom_id
else:
# Make a string with length of 64
choice_str = "abcdefghijklmopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ1234567890"
obj.id = ''.join(choices(choice_str, k=id_length))
return obj
def __hash__(self) -> int:
return self.id.__hash__()
Now lets create a dict:
>>> a_1 = DictStr('a')
>>> a_2 = DictStr('a')
>>> a_3 = 'a'
>>> a_1
a
>>> a_2
a
>>> a_1 == a_2 == a_3
True
>>> d = dict()
>>> d[a_1] = 'some_data'
>>> d[a_2] = 'other'
>>> print(d)
{'a': 'some_data', 'a': 'other'}
NOTE: This solution can apply to any basic data structure like (int, float,...)
EXPLANATION :
We can use almost any object as key in dict class (or mostly known as HashMap or HashTable in other languages) but there should be a way to distinguish between keys because dict have no idea about objects.
For this purpose objects that want to add to dictionary as key somehow have to provide a unique identifier number(I name it uniq_id, it's actually a number somehow created with hash algorithm) for themself.
Because dictionary structure widely use in most of solutions,
most of programming languages hide object uniq_id generation inside a hash name buildin method that feed dict in key search
So if you manipulate hash method of your class you can change behaviour of your class as dictionary key

Dictionary does not support duplicate key, instead you can use defaultdict
Below is the example of how to use defaultdict in python3x to solve your problem
from collections import defaultdict
sdict = defaultdict(list)
keys_bucket = list()
data_list = [lines.split(",") for lines in contents.split("\n")]
for data in data_list:
key = data.pop(0)
detail = data
keys_bucket.append(key)
if key in keys_bucket:
sdict[key].append(detail)
else:
sdict[key] = detail
print("\n", dict(sdict))
Above code would produce output as follow:
{'EDF768': [[' Bill Meyer', ' 2456', ' Vet_Parking'], [' Jenny Meyer', ' 9987', ' Vet_Parking']], 'TY5678': [[' Jane Miller', ' 8987', ' AgHort_Parking'], [' Jo King', ' 8987', ' AgHort_Parking']], 'GEF123': [[' Jill Black', ' 3456', ' Creche_Parking']], 'ABC234': [[' Fred Greenside', ' 2345', ' AgHort_Parking']], 'GH7682': [[' Clara Hill', ' 7689', ' AgHort_Parking']], 'JU9807': [[' Jacky Blair', ' 7867', ' Vet_Parking'], [' Mike Green', ' 3212', ' Vet_Parking']], 'KLOI98': [[' Martha Miller', ' 4563', ' Vet_Parking']], 'ADF645': [[' Cloe Freckle', ' 6789', ' Vet_Parking']], 'DF7800': [[' Jacko Frizzle', ' 4532', ' Creche_Parking']], 'WER546': [[' Olga Grey', ' 9898', ' Creche_Parking']], 'HUY768': [[' Wilbur Matty', ' 8912', ' Creche_Parking']]}

How can I make an Enum that allows reused keys? [duplicate]

I'm trying to get the name of a enum given one of its multiple values:
class DType(Enum):
float32 = ["f", 8]
double64 = ["d", 9]
when I try to get one value giving the name it works:
print DType["float32"].value[1] # prints 8
print DType["float32"].value[0] # prints f
but when I try to get the name out of a given value only errors will come:
print DataType(8).name
print DataType("f").name
raise ValueError("%s is not a valid %s" % (value, cls.name))
ValueError: 8 is not a valid DataType
ValueError: f is not a valid DataType
Is there a way to make this? Or am I using the wrong data structure?

The easiest way is to use the aenum library1, which would look like this:
from aenum import MultiValueEnum
class DType(MultiValueEnum):
float32 = "f", 8
double64 = "d", 9
and in use:
>>> DType("f")
<DType.float32: 'f'>
>>> DType(9)
<DType.double64: 'd'>
As you can see, the first value listed is the canonical value, and shows up in the repr().
If you want all the possible values to show up, or need to use the stdlib Enum (Python 3.4+), then the answer found here is the basis of what you want (and will also work with aenum):
class DType(Enum):
float32 = "f", 8
double64 = "d", 9
def __new__(cls, *values):
obj = object.__new__(cls)
# first value is canonical value
obj._value_ = values[0]
for other_value in values[1:]:
cls._value2member_map_[other_value] = obj
obj._all_values = values
return obj
def __repr__(self):
return '<%s.%s: %s>' % (
self.__class__.__name__,
self._name_,
', '.join([repr(v) for v in self._all_values]),
)
and in use:
>>> DType("f")
<DType.float32: 'f', 8>
>>> Dtype(9)
<DType.float32: 'd', 9>
1 Disclosure: I am the author of the Python stdlib Enum, the enum34 backport, and the Advanced Enumeration (aenum) library.

Why I cannot get the key list of a dictionary in Python3.6?

I am learning Python. I tried to get the keys of a dictionary. But I only get the last key. In my understanding, method keys() is used to get all keys in the dictionary.
Following are my questions?
1. Why I cannot get all keys?
2. If I have a dictionary, how can I get the value if I know the key? e.g. dict = {'Ben':8, 'Joe':7, 'Mary' : 9}. How can I input the key = "Ben", so the program can output the value 8? The tutorial shows that the key must be immutable. This constraint is very inconvenient when trying to get a value with a given key.
Any suggestion would be highly appreciated.
Here are my code.
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
input_control= work_path + "/" + input_control_file
#open control file if file exist
#read setting info
try:
#if the file does not exist,
#then it would throw an IOError
f = open(input_control, 'r')
#define dictionary/hash table
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic = {lst[0].strip():lst[1].strip()}
except IOError:
# print(os.error) will <class 'OSError'>
print("Reading file error. File " + input_control + " does not exist.")
#get keys
def getkeys(dict):
return list(dict.keys())
print("l39")
print(getkeys(dic))
print("end")
Below are the outputs.
l39
['source_type']
end

The reason is that you are reassigning variable dic again in for loop. You are not updating or adding the dictionary, instead you are reassigning the variable. In that case, dic will have only the last entry. You can change your for loop to:
dic = {}
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic.update({lst[0].strip():lst[1].strip()}) # update the dictionary with new values.
For your other question, if you have the dictionary dic = {'Ben':8, 'Joe':7, 'Mary' : 9}, then you can get the value by: dic['Ben']. It will return the value 8 or will raise KeyError if key Ben is not found in the dictionary. To avoid KeyError, you can use the get() method of dictionary. It will return None if provided key is not found in the dictionary.
val = dic['Ben'] # returns 8
val = dic['Hen'] # will raise KeyError
val = dic.get('Hen') # will return None

In your for loop, you are re-initializing the dictionary value, while you need to update the dictionary, i.e., append the key-value pair to the pre-existing dictionary. For this, use
dic.update({lst[0].strip() : lst[1].strip()})
This will update the key-value pair to the dictionary. Now, when you use dic.keys(), you will get all the keys of dic, as a list.
As for your second question, access the dictionary, just like accessing a list, except that list is accessed with indices, and dictionary will be accessed by keys. Say, you have a list and a dictionary as
lst = [1, 2, 3, 4, 5]
dic = {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4, 'e' : 5}
To get value 2 from list, you do lst[1], i.e., value at index 1. Similarly, if you want to get the value 2 from dictionary, do dic['b'], i.e., value of key 'b'. It is as simple as that.

Recursively iterate through a nested dict and return value of the first matching key

I have a deeply nested dict and need to iterate through it and return the value corresponding to the key argument, second argument of my function.
For example, with
tree = {"a": 12, "g":{ "b": 2, "c": 4}, "d":5}
tree_traverse(tree, "d") should return 5
Here is my code:
def tree_traverse(tree, key):
for k,v in tree.items():
if isinstance(v, dict):
tree_traverse(v, key)
elif k == key:
return v
The problem I have is that this function returns None if it doesnt find the matching key once it's done iterating through the deepest nested dict.
I don't want it to return anything before the matching key is found.
I didn't find a solution in another thread, most of them use print statements and don't return anything so I guess it avoids this issue.

You have to check whether the recursive call actually found something so you can continue the loop. E.g. try the following:
def tree_traverse(tree, key):
if key in tree:
return tree[key]
for v in filter(dict.__instancecheck__, tree.values()):
if (found := tree_traverse(v, key)) is not None:
return found

Here we instantiate an object when the function is created, that all executions of the function will share, called _marker. We return this object if we don't find the key. (You could also use None here, but None is frequently a meaningful value.)
def tree_traverse(tree, key, *, _marker=object()):
for k,v in tree.items():
if isinstance(v, dict):
res = tree_traverse(v, key, _marker=_marker)
if res is not _marker:
return res
elif k == key:
return v
return _marker
def find(tree, key):
_marker = object()
res = tree_traverse(tree, key, _marker=_marker)
if res is _marker:
raise KeyError("Key {} not found".format(key))
return res
I use tree_traverse as a helper function because we want different behaviour at the outermost layer of our recursion (throw an error) than we want inside (return a _marker object)

A NestedDict can solve the problem
from ndicts import NestedDict
def tree_traverse(tree, k):
nd = NestedDict(tree)
for key, value in nd.items():
if k in key:
return value
>>> tree = {"a": 12, "g":{ "b": 2, "c": 4}, "d":5}
>>> tree_traverse(tree, "d")
5
To install ndicts pip install ndicts

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Interaction between hash and eq in Python - python-3.x

Related

Overriding the `[]` operator in a dictionary of dictionaries

How to add data with same key in dictionary [duplicate]

How can I make an Enum that allows reused keys? [duplicate]

Why I cannot get the key list of a dictionary in Python3.6?

Recursively iterate through a nested dict and return value of the first matching key

Categories

Resources