python merging nested dictionaries into a new dictionary and update that dictionary without affecting originals - python-3.x

I'm using Python 3 and I'm trying to merge two dictionaries of dictionaries into one dictionary. I'm successful, however, when I add a new key/value pair to the new dictionary, it also adds it to both of the original dictionaries. I would like the original dictionaries to remain unchanged.
dict1 = {'key_val_1': {'a': '1', 'b': '2', 'c': '3'}}
dict2 = {'key_val_2': {'d': '4', 'e': '5', 'f': '6'}}
dict3 = dict1 | dict2
for x in dict3:
dict3[x]['g'] = '7'
The above code will append 'g': '7' to all 3 dictionaries and I only want to alter dict3. I have to assume that this is the intended behavior, but for the life of me I can't understand why (or how to get the desired results).

I believe the root of your problem is your assumption that when concatenating the two dictionaries dict1 and dict2 that python makes a copy of the dictionaries before concatenating them. In fact Python simply creates a new object with pointers to each of the parts. With this in mind, when you change the contents of a part of dict3 you are in reality changing the underlying dictionaries dict1 and dict2. To remedy this condition, you need to make copies of the underlying dictionaries before concatenating them or merge them rather than concatenating them.
Using the copy function:
from copy import deepcopy
dict3 = deepcopy(dict1) | deepcopy(dict2)
Now dict3 contains independent copies of dict1 and dict2
To merge the dicts:
from copy import copy
def merge(d1, d2):
rslt = dict()
for k in d1.keys():
rslt[k] = d1[k].copy() #Note still necessary to copy underlying dict
for k in d2.keys():
rslt[k] = d2[k].copy()
return rslt
then use:
dict3 = merge(dict1, dict2)

Issue you are having is because dict3 consists reference to the sub-dicts in dict1 and dict2. And the dict objects are mutable. So, when you change a dict in one place, it effects all the place where it is referenced. You can verify it by using the id() function. Example:
>>> print(id(dict1['key_val_1']))
>>> 140294429633472
>>> print(id(dict3['key_val_1']))
>>> 140294429633472
>>> print(id(dict2['key_val_2']))
>>> 140294429633728
>>> print(id(dict3['key_val_2']))
>>> 140294429633728
From above example you can verify that, the sub-dict in dict1 and dict2 are referenced in dict3. So, when you modify them in dict3 the orginial dicts are also modified, as dict are mutable.
So, to solve your issue, you need to make a deep copy of each sub-dict before merging them.

Related

Merge dictionaries to overwrite duplicate keys without overwriting duplicate and non-duplicate values

Input:
dict1 = {a: [xxx, zzz]}
dict2 = {a: [yyy, zzz]}
Desired output:
dict3 = {a: [xxx, zzz, yyy, zzz]}
I have tried:
dict3 = dict1 | dict2
and
dict3 = dict1.copy()
d3 |= d2
However, the merge | and update |= operators overwrites with the last seen dict as precedence, resulting in:
dict3 = {a: [yyy, zzz]}
This is the desired result, as stated in the PEP 584,
Dict union will return a new dict consisting of the left operand
merged with the right operand, each of which must be a dict (or an
instance of a dict subclass). If a key appears in both operands, the
last-seen value (i.e. that from the right-hand operand) wins
You may need to merge two dict by hands:
In [8]: dict1 = {'a': ['xxx', 'zzz']}
...: dict2 = {'a': ['yyy', 'zzz']}
...: for k, v in dict2.items():
...: if k in dict1:
...: dict1[k] += v
...:
In [9]: print(dict1)
{'a': ['xxx', 'zzz', 'yyy', 'zzz']}
The answer from xiez does not perform what the OP requested, for two reasons:
It updates the dict1 instead of creating a new, independent dict3
It misses out all items that exist solely in dict2, because it reacts only on those present in dict1 (line 4) - although this is not apparent from the one given example
A possible solution:
dict3 = {
k: dict1.get(k, []) + dict2.get(k, [])
for k in set(dict1) | set(dict2)
}
For each key in the united set of keys of both source dicts, we use the dicts' get function to extract the value of the key, if existing, or get the default empty list. Then we create the new dict3 by blindly adding the found or default lists for each key.

I append dictionary to a list, while printing i get only the last appended thing in the dictionary of the list

class one:
def __init__(self,id,d):
self.id=id
self.d=d
def printfun(self):
for i in l:
print(i.id,i.d)
l=[]
d={}
for i in range(2):
id=int(input())
d["a"]=int(input())
d["b"]=int(input())
o=one(id,d)
l.append(o)
o.printfun()
and my output is:
100
1
2
101
3
4
100 {'a': 3, 'b': 4}
101 {'a': 3, 'b': 4}
I append dictionary to a list, while printing i get only the last appended thing in the dictionary of the list. How to get all the dictionary i have appended in the list, and why i am not getting first dictionary i had appended in the list.
You need to append a new dictionary to the list, because otherwise you're appending a reference to the old list that has modified values.
Python never implicitly copies objects. When you set dict2 = dict1, you are making them refer to the same exact dict object, so when you mutate it, all references to it keep referring to the object in its current state.
https://stackoverflow.com/a/2465932/4361039
l=[]
for i in range(2):
id=int(input())
a=int(input())
b=int(input())
o=one(id, {"a": a, "b": b})
l.append(o)
o.printfun()

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:
import numpy as np
import pandas as pd
def rangeFunc(arr):
return (np.max(arr) - np.min(arr))
df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns
sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample
I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:
for d in col_list:
sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))
However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:
{'A': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'B': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'C': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'D': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744]}
I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.
P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.
The issue is this line:
sample_range_dict = dict.fromkeys(col_list, [])
You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:
>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]
If you want each key to have a different list, either create a new list for each key:
>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]
Or use a defaultdict which will do it for you:
>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

sort values of lists inside dictionary based on length of characters

d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?
You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}

Merge Keys by common value from the same dictionary

Let's say that I have a dictionary that contains the following:
myDict = {'A':[1,2], 'B': [4,5], 'C': [1,2]}
I want to create a new dictionary, merged that merges keys by having similar values, so my merged would be:
merged ={['A', 'C']:[1:2], 'B':[4,5]}
I have tried using the method suggested in this thread, but cannot replicate what I need.
Any suggestions?
What you have asked for is not possible. Your keys in the hypothetical dictionary use mutable lists. As mutable data can not be hashed, you cant use them as dictionary keys.
Edit, I had a go to doing what you asked for except the keys in this are all tuples. This code is a mess but you may be able to clean it up.
myDict = {'A':[1,2],
'B': [4,5],
'C': [1,2],
'D': [1, 2],
}
myDict2 = {k: tuple(v) for k, v in myDict.items()}
print(myDict2) #turn all vlaues into hasable tuples
#make set of unique keys
unique = {tuple(v) for v in myDict.values()}
print(unique) #{(1, 2), (4, 5)}
"""
iterate over each value and make a temp shared_keys list tracking for which
keys the values are found. Add the new key, vlaue pairs into a new
dictionary"""
new_dict = {}
for value in unique:
shared_keys = []
for key in myDict:
if tuple(myDict[key]) == value:
shared_keys.append(key)
new_dict[tuple(shared_keys)] = value
print(new_dict) #{('A', 'C'): (1, 2), ('B',): (4, 5)}
#change the values back into mutable lists from tuples
final_dict = {k: list(v) for k, v in new_dict.items()}
print(final_dict)

Resources