Merge dictionaries to overwrite duplicate keys without overwriting duplicate and non-duplicate values - python-3.x

Input:
dict1 = {a: [xxx, zzz]}
dict2 = {a: [yyy, zzz]}
Desired output:
dict3 = {a: [xxx, zzz, yyy, zzz]}
I have tried:
dict3 = dict1 | dict2
and
dict3 = dict1.copy()
d3 |= d2
However, the merge | and update |= operators overwrites with the last seen dict as precedence, resulting in:
dict3 = {a: [yyy, zzz]}

This is the desired result, as stated in the PEP 584,
Dict union will return a new dict consisting of the left operand
merged with the right operand, each of which must be a dict (or an
instance of a dict subclass). If a key appears in both operands, the
last-seen value (i.e. that from the right-hand operand) wins
You may need to merge two dict by hands:
In [8]: dict1 = {'a': ['xxx', 'zzz']}
...: dict2 = {'a': ['yyy', 'zzz']}
...: for k, v in dict2.items():
...: if k in dict1:
...: dict1[k] += v
...:
In [9]: print(dict1)
{'a': ['xxx', 'zzz', 'yyy', 'zzz']}

The answer from xiez does not perform what the OP requested, for two reasons:
It updates the dict1 instead of creating a new, independent dict3
It misses out all items that exist solely in dict2, because it reacts only on those present in dict1 (line 4) - although this is not apparent from the one given example
A possible solution:
dict3 = {
k: dict1.get(k, []) + dict2.get(k, [])
for k in set(dict1) | set(dict2)
}
For each key in the united set of keys of both source dicts, we use the dicts' get function to extract the value of the key, if existing, or get the default empty list. Then we create the new dict3 by blindly adding the found or default lists for each key.

Related

python merging nested dictionaries into a new dictionary and update that dictionary without affecting originals

I'm using Python 3 and I'm trying to merge two dictionaries of dictionaries into one dictionary. I'm successful, however, when I add a new key/value pair to the new dictionary, it also adds it to both of the original dictionaries. I would like the original dictionaries to remain unchanged.
dict1 = {'key_val_1': {'a': '1', 'b': '2', 'c': '3'}}
dict2 = {'key_val_2': {'d': '4', 'e': '5', 'f': '6'}}
dict3 = dict1 | dict2
for x in dict3:
dict3[x]['g'] = '7'
The above code will append 'g': '7' to all 3 dictionaries and I only want to alter dict3. I have to assume that this is the intended behavior, but for the life of me I can't understand why (or how to get the desired results).
I believe the root of your problem is your assumption that when concatenating the two dictionaries dict1 and dict2 that python makes a copy of the dictionaries before concatenating them. In fact Python simply creates a new object with pointers to each of the parts. With this in mind, when you change the contents of a part of dict3 you are in reality changing the underlying dictionaries dict1 and dict2. To remedy this condition, you need to make copies of the underlying dictionaries before concatenating them or merge them rather than concatenating them.
Using the copy function:
from copy import deepcopy
dict3 = deepcopy(dict1) | deepcopy(dict2)
Now dict3 contains independent copies of dict1 and dict2
To merge the dicts:
from copy import copy
def merge(d1, d2):
rslt = dict()
for k in d1.keys():
rslt[k] = d1[k].copy() #Note still necessary to copy underlying dict
for k in d2.keys():
rslt[k] = d2[k].copy()
return rslt
then use:
dict3 = merge(dict1, dict2)
Issue you are having is because dict3 consists reference to the sub-dicts in dict1 and dict2. And the dict objects are mutable. So, when you change a dict in one place, it effects all the place where it is referenced. You can verify it by using the id() function. Example:
>>> print(id(dict1['key_val_1']))
>>> 140294429633472
>>> print(id(dict3['key_val_1']))
>>> 140294429633472
>>> print(id(dict2['key_val_2']))
>>> 140294429633728
>>> print(id(dict3['key_val_2']))
>>> 140294429633728
From above example you can verify that, the sub-dict in dict1 and dict2 are referenced in dict3. So, when you modify them in dict3 the orginial dicts are also modified, as dict are mutable.
So, to solve your issue, you need to make a deep copy of each sub-dict before merging them.

sort values of lists inside dictionary based on length of characters

d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?
You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}

How to remove a character from the values of a dictionary?

dict_data = {'c': ['d\n', 'e\n'], 'm':['r\n','z\n','o']}
a dictionary dict_data remove '\n' in the values
(order is not important.):
should return: {'c': ['d', 'e'], 'm':['r','z','o']}
This is what I tried:
def dicts(dict_data):
for k, v in dict_data.items():
for i in v:
f = i.strip('\n')
return f
How can i get this without doing anything to complicated?
You were on the right approach but you've probably assumed that altering i with i.strip('\n') in for i in v might make the change appear in dict_data. This isn't the case. What you're doing is altering i and then discarding the result.
A correct approach would be to build a list of the stripped elements and re-assign to the corresponding dictionary key:
def strip_dicts(dict_data):
for k, v in dict_data.items():
f = []
for i in v:
f.append(i.strip('\n'))
dict_data[k] = f
of course, remember, this alters the argument dictionary in-place.
You can create a different function that returns a new dictionary by using a comprehension:
def strip_dicts(d):
return {k: [i.strip() for i in v] for k,v in d.items()}

Merge Keys by common value from the same dictionary

Let's say that I have a dictionary that contains the following:
myDict = {'A':[1,2], 'B': [4,5], 'C': [1,2]}
I want to create a new dictionary, merged that merges keys by having similar values, so my merged would be:
merged ={['A', 'C']:[1:2], 'B':[4,5]}
I have tried using the method suggested in this thread, but cannot replicate what I need.
Any suggestions?
What you have asked for is not possible. Your keys in the hypothetical dictionary use mutable lists. As mutable data can not be hashed, you cant use them as dictionary keys.
Edit, I had a go to doing what you asked for except the keys in this are all tuples. This code is a mess but you may be able to clean it up.
myDict = {'A':[1,2],
'B': [4,5],
'C': [1,2],
'D': [1, 2],
}
myDict2 = {k: tuple(v) for k, v in myDict.items()}
print(myDict2) #turn all vlaues into hasable tuples
#make set of unique keys
unique = {tuple(v) for v in myDict.values()}
print(unique) #{(1, 2), (4, 5)}
"""
iterate over each value and make a temp shared_keys list tracking for which
keys the values are found. Add the new key, vlaue pairs into a new
dictionary"""
new_dict = {}
for value in unique:
shared_keys = []
for key in myDict:
if tuple(myDict[key]) == value:
shared_keys.append(key)
new_dict[tuple(shared_keys)] = value
print(new_dict) #{('A', 'C'): (1, 2), ('B',): (4, 5)}
#change the values back into mutable lists from tuples
final_dict = {k: list(v) for k, v in new_dict.items()}
print(final_dict)

How do I create a default dictionary of dictionaries

I am trying to write some code that involves creating a default dictionary of dictionaries. However, I have no idea how to initialise/create such a thing. My current attempt looks something like this:
from collections import defaultdict
inner_dict = {}
dict_of_dicts = defaultdict(inner_dict(int))
The use of this default dict of dictionaries is to for each pair of words that I produce from a file I open (e.g. [['M UH M', 'm oo m']] ), to set each segment of the first word delimited by empty space as a key in the outer dictionary, and then for each segment in the second word delimited by empty space count the frequency of that segment.
For example
[['M UH M', 'm oo m']]
(<class 'dict'>, {'M': {'m': 2}, 'UH': {'oo': 1}})
Having just run this now it doesn't seem to have output any errors, however I was just wondering if something like this will actually produce a default dictionary of dictionaries.
Apologies if this is a duplicate, however previous answers to these questions have been confusing and in a different context.
To initialise a defaultdict that creates dictionaries as its default value you would use:
d = defaultdict(dict)
For this particular problem, a collections.Counter would be more suitable
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] += 1
>>> print(d)
defaultdict(collections.Counter,
{'M': Counter({'m': 2}), 'UH': Counter({'oo': 1})})
Edit
You expressed interest in a comment about the equivalent without a Counter. Here is the equivalent using a plain dict
>>> from collections import defaultdict
>>> d = defaultdict(dict)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] = d[a].get(b, 0) + 1
>>> print(d)
defaultdict(dict, {'M': {'m': 2}, 'UH': {'oo': 1}})
You also could a use a normal dictionary and its setdefault method.
my_dict.setdefault(key, default) will look up my_dict[key] and ...
... if the key already exists, return its current value without modifying it, or ...
... assign the default value (my_dict[key] = default) and then return that.
So you can call my_dict.setdefault(key, {}) always when you want to get a value from your outer dictionary instead of the normal my_dict[key] to retrieve either the real value assigned with this key if it#s present, or to get a new empty dictionary as default value which gets automatically stored into your outer dictionary as well.
Example:
outer_dict = {"M": {"m": 2}}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {}}
# inner_dict = {}
inner_dict["oo"] = 1
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict["xy"] = 3
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1, "xy": 3}}
# inner_dict = {"oo": 1, "xy": 3}
This way you always get a valid inner_dict, either an empty default one or the one that's already present for the given key. As dictionaries are mutable data types, modifying the returned inner_dict will also modify the dictionary inside outer_dict.
The other answers propose alternative solutions or show you can make a default dictionary of dictionaries using d = defaultdict(dict)
but the question asked how to make a default dictionary of default dictionaries, my navie first attempt was this:
from collections import defaultdict
my_dict = defaultdict(defaultdict(list))
however this throw an error: *** TypeError: first argument must be callable or None
so my second attempt which works is to make a callable using the lambda key word to make an anonymous function:
from collections import defaultdict
my_dict = defaultdict(lambda: defaultdict(list))
which is more concise than the alternative method using a regular function:
from collections import defaultdict
def default_dict_maker():
return defaultdict(list)
my_dict = defaultdict(default_dict_maker)
you can check it works by assigning:
my_dict[2][3] = 5
my_dict[2][3]
>>> 5
or by trying to return a value:
my_dict[0][0]
>>> []
my_dict[5]
>>> defaultdict(<class 'list'>, {})
tl;dr
this is your oneline answer my_dict = defaultdict(lambda: defaultdict(list))

Resources