How to get keys from nested dictionary of arbitrary length in Python - python-3.x

I have a dictionary object in python. Let's call it as dict. This object could contain another dictionary which may in turn contain another dictionary and so on.
dict = { 'k': v, 'k1': v1, 'dict2':{'k3': v3, 'k4':v4} , 'dict3':{'k5':v5, dict4:{'k6':v6}}}
This is just an example. Length of outermost dictionary could be anything. I want to extract keys from such dictionary object in following two ways :
get list of only keys.
[k,k1,k2,k3,k4,k5,k6]
get list of keys and its parent associated dictionary so something like this :
outer_dict_keys = [k ,dict2, dict3]
dict2_keys = [k3,k4]
dict3_keys = [k5, dict4]
dict4_keys = [k6]
Outermost dictionary dict length is always changing so I can not hard code anything.
What is best way to achieve above result ?

Use a mix of iteration and tail recursion. After quoting undefined names, making spacing uniform, and removing 'k2' from the first result, I came up with the code below. (Written and tested for 3.4, it should run on any 3.x and might on 2.7.) A key thing to remember is that the iteration order of dicts is essentially random, and varies with each run. Recursion as done here visit sub-dicts in depth-first rather than breadth-first order. For dict0, both are the same, But if dict4 were nested in dict2 rather than dict3, they would not be.
dict0 = {'k0': 0, 'k1': 1, 'dict2':{'k3': 3, 'k4': 4},
'dict3':{'k5': 5, 'dict4':{'k6': 6}}}
def keys(dic, klist=[]):
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append(val)
else:
klist.append(key)
for subdict in subdics:
keys(subdict, klist)
return klist
result = keys(dict0)
print(result, '\n', result == ['k0','k1','k3','k4','k5','k6'])
def keylines(dic, name='outer_dict', lines=[]):
vals = []
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append((key,val))
else:
vals.append(key)
vals.extend(pair[0] for pair in subdics)
lines.append('{}_keys = {}'.format(name, vals))
for subdict in subdics:
keylines(subdict[1], subdict[0], lines)
return lines
result = keylines(dict0)
for line in result:
print(line,)
print()
expect = [
"outer_dict_keys = ['k0', 'k1', 'dict2', 'dict3']",
"dict2_keys = ['k3', 'k4']",
"dict3_keys = ['k5', 'dict4']",
"dict4_keys = ['k6']"]
for actual, want in zip(result, expect):
if actual != want:
print(want)
for i, (c1, c2) in enumerate(zip(actual, want)):
if c1 != c2:
print(i, c1, c2)

Related

How do I remove all duplicate entries from one nested dictionary that appear in another?

Assuming my dictionaries are set up like this:
dict_a = {"first_key": {"second_key": "value1", "third_key": "value2"}}
dict_b = {"first_key": {"third_key": "value2"}}
I want to be left with this:
dict_a = {first_key: {second_key: value1}}
I've tried a few different ways of getting there like this:
dict(dica_a.items() - dict_b.items())
But that tells me dicts are unhashable. Trying this method:
dict_c = {k:dict_a[k] for k in dict_a if k not in dict_b}
Leaves me with an empty dictionary. I also tried this:
for k, v in dict_b.items():
if (k, v) in dict_a.itemS():
dict_a.pop(k, v)
But again, no luck there. It ended up not modifying dict_a at all.
additional_key_to_remove = []
for key, value in dict_b.items():
if isinstance(dict_a[key], dict) and isinstance(
value, dict
): # to makes sure that operation are happening on dictionary and not on any other datastructures
sub_dict = dict_a[key]
for k in value:
sub_dict.pop(k, None)
elif isinstance(dict_a[key], str) and isinstance(
value, str
): # collect the non nested keys for later removal
additional_key_to_remove.append(key)
for key in additional_key_to_remove:
del dict_a[key]
print(dict_a)
Output:
{'first_key': {'second_key': 'value1'}}

Is there a simpler way to extract the last value of a dictionary?

So I was tasked to make a function using python, that returns how many values there is in a dictionary that ONLY contains lists. An example of such a dictionary would be:
animals = { 'a': ['alpaca','ardvark'], 'b': ['baboon'], 'c': ['coati']}
The values inside the list also count towards the total values returned from the function, which means that it has to return 4. This is the function I made:
def how_many(aDict):
'''
aDict: A dictionary, where all the values are lists.
returns: int, how many values are in the dictionary.
'''
numValues = 0;
while aDict != {}:
tupKeyValue = aDict.popitem();
List = tupKeyValue[1];
numValues += len(List);
return numValues;
So I was wondering if there was a way to pop the last value of a dictionary without popitem() which extracts the key-value pair. Just trying to make it as simple as possible.
Since you are not using the dictionaries keys maybe you could just use values() along with sum():
def how_many(d):
return sum(len(v) for v in d.values())
animals = {'a': ['alpaca', 'ardvark'], 'b': ['baboon'], 'c': ['coati']}
print(how_many(animals))
Output:
4

python3 value returned wrong with container variable

I meet a code that failed to meet my expectation. Details as below:
a = ['name']
b = [('name=cheng',),('name=huang',),('name=pan',)]
Dict = {}
c = []
for i in range(0,3):
for j in range(0,1):
Dict[a[j]] = b[i][j]
c.append(Dict)
print(c)
>>> [{'name':'name=pan'},{'name':'name=pan'},{'name':'name=pan'}]
what i expected should be
>>> [{'name':'name=cheng'},{'name':'name=huang'},{'name':'name=pan'}]
So could you please tell me how to solve the issue ?
You are changing the value of Dict in place and not creating a new dictionary every time. Each iteration of the loop, you are setting Dict["name"] equal to one of the elements in b and then appending it to the list. The next iteration of your loop changes dict in place (meaning the previous version you appending to c will also be changed). The result is that your list c is filled with 3 exact copies (exact same location in memory) of the dictionary created in the final iteration of the loop.
How do you fix this? Make a new dictionary every time.
a = ['name']
b = [('name=cheng',),('name=huang',),('name=pan',)]
c = []
for i in range(0,3):
for j in range(0,1):
temp_dict = {a[j]: b[i][j]}
c.append(temp_dict)
print(c)
Result:
[{'name': 'name=cheng'}, {'name': 'name=huang'}, {'name': 'name=pan'}]
You use the same value of Dict for all of the iterations of the loop. So all of the dictionaries are the same. You just have three copies of the same dictionary in the list.
If you move the Dict = {} statement into the loop, it will be fixed.
a = ['name']
b = [('name=cheng',),('name=huang',),('name=pan',)]
c = []
for i in range(0,3):
Dict = {}
for j in range(0,1):
Dict[a[j]] = b[i][j]
c.append(Dict)
print(c)
Or more Pythonic:
keys = ['name']
values_list = [('name=cheng',), ('name=huang',), ('name=pan',)]
result = []
for values in values_list:
result.append(dict(zip(keys, values)))
print(result)
This works by using the zip builtin which does the same thing as [(x[i], y[i]) for i in range(min(len(x), len(y))] without needing to keep track of the indices or lengths.
The dict class can build a dictionary from a list of tuples, which is what this solution uses.

Pyspark Runtime Error Dictionary Changed size during iteration [duplicate]

I have obj like this
{hello: 'world', "foo.0.bar": v1, "foo.0.name": v2, "foo.1.bar": v3}
It should be expand to
{ hello: 'world', foo: [{'bar': v1, 'name': v2}, {bar: v3}]}
I wrote code below, splite by '.', remove old key, append new key if contains '.', but it said RuntimeError: dictionary changed size during iteration
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
def expandField(obj, f, v):
parts = f.split('.')
if(len(parts) == 1):
return
del obj[f]
for i in xrange(0, len(parts) - 1):
f = parts[i]
currobj = obj.get(f)
if (currobj == None):
nextf = parts[i + 1]
currobj = obj[f] = re.match(r'\d+', nextf) and [] or {}
obj = currobj
obj[len(parts) - 1] = v
for k, v in obj.iteritems():
RuntimeError: dictionary changed size during iteration
Like the message says: you changed the number of entries in obj inside of expandField() while in the middle of looping over this entries in expand.
You might try instead creating a new dictionary of the form you wish, or somehow recording the changes you want to make, and then making them AFTER the loop is done.
You might want to copy your keys in a list and iterate over your dict using the latter, eg:
def expand(obj):
keys = list(obj.keys()) # freeze keys iterator into a list
for k in keys:
expandField(obj, k, v)
I let you analyse if the resulting behavior suits your expected results.
Edited as per comments, thank you !
I had a similar issue with wanting to change the dictionary's structure (remove/add) dicts within other dicts.
For my situation I created a deepcopy of the dict. With a deepcopy of my dict, I was able to iterate through and remove keys as needed.Deepcopy - PythonDoc
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
Hope this helps!
For those experiencing
RuntimeError: dictionary changed size during iteration
also make sure you're not iterating through a defaultdict when trying to access a non-existent key! I caught myself doing that inside the for loop, which caused the defaultdict to create a default value for this key, causing the aforementioned error.
The solution is to convert your defaultdict to dict before looping through it, i.e.
d = defaultdict(int)
d_new = dict(d)
or make sure you're not adding/removing any keys while iterating through it.
Rewriting this part
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
to the following
def expand(obj):
keys = obj.keys()
for k in keys:
if k in obj:
expandField(obj, k, v)
shall make it work.

updating tuple string and how to optimize my code

I have one list like that :
[`('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]`
I want to replace tuple element string value like this:
'__label__c091cb93-c737-4a67-95d7-49feecc6456c' to 'c091cb93-c737-4a67-95d7-49feecc6456c'
I try this :
l = [('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
j = []
for x in l:
for y in x:
if type(y) == str:
z = y.replace('__label__',"")
j.append((z, x[1]))
print(j)
Output:
[('c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
how to optimize my code in pythonic way and any other way to update tuple value because tuple is immutable
You are right, tuples are immutables in Python, but lists are not. So you should be able to update the list l in-place.
Moreover, it looks like you already know the position of the element you have to modify and the position of the substring you want to remove, so you can avoid one loop and the replace function which will iterate once more over your string.
for i in range(len(l)):
the_tuple = l[i]
if isinstance(the_tuple[0], str) and the_tuple[0].startswith('__label__'):
l[i] = (the_tuple[0][len('__label__'):], the_tuple[1])
# you can also replace "len('__label__')" by "8" to increase performances
# but I think Python already optimizes it
You can use map function:
data = [('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
def f(row): return row[0].replace('__label__', ''), row[1]
print(list(map(f, data)))

Resources