Pyspark Runtime Error Dictionary Changed size during iteration [duplicate] - apache-spark

I have obj like this
{hello: 'world', "foo.0.bar": v1, "foo.0.name": v2, "foo.1.bar": v3}
It should be expand to
{ hello: 'world', foo: [{'bar': v1, 'name': v2}, {bar: v3}]}
I wrote code below, splite by '.', remove old key, append new key if contains '.', but it said RuntimeError: dictionary changed size during iteration
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
def expandField(obj, f, v):
parts = f.split('.')
if(len(parts) == 1):
return
del obj[f]
for i in xrange(0, len(parts) - 1):
f = parts[i]
currobj = obj.get(f)
if (currobj == None):
nextf = parts[i + 1]
currobj = obj[f] = re.match(r'\d+', nextf) and [] or {}
obj = currobj
obj[len(parts) - 1] = v
for k, v in obj.iteritems():
RuntimeError: dictionary changed size during iteration

Like the message says: you changed the number of entries in obj inside of expandField() while in the middle of looping over this entries in expand.
You might try instead creating a new dictionary of the form you wish, or somehow recording the changes you want to make, and then making them AFTER the loop is done.

You might want to copy your keys in a list and iterate over your dict using the latter, eg:
def expand(obj):
keys = list(obj.keys()) # freeze keys iterator into a list
for k in keys:
expandField(obj, k, v)
I let you analyse if the resulting behavior suits your expected results.
Edited as per comments, thank you !

I had a similar issue with wanting to change the dictionary's structure (remove/add) dicts within other dicts.
For my situation I created a deepcopy of the dict. With a deepcopy of my dict, I was able to iterate through and remove keys as needed.Deepcopy - PythonDoc
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
Hope this helps!

For those experiencing
RuntimeError: dictionary changed size during iteration
also make sure you're not iterating through a defaultdict when trying to access a non-existent key! I caught myself doing that inside the for loop, which caused the defaultdict to create a default value for this key, causing the aforementioned error.
The solution is to convert your defaultdict to dict before looping through it, i.e.
d = defaultdict(int)
d_new = dict(d)
or make sure you're not adding/removing any keys while iterating through it.

Rewriting this part
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
to the following
def expand(obj):
keys = obj.keys()
for k in keys:
if k in obj:
expandField(obj, k, v)
shall make it work.

Related

How do I remove all duplicate entries from one nested dictionary that appear in another?

Assuming my dictionaries are set up like this:
dict_a = {"first_key": {"second_key": "value1", "third_key": "value2"}}
dict_b = {"first_key": {"third_key": "value2"}}
I want to be left with this:
dict_a = {first_key: {second_key: value1}}
I've tried a few different ways of getting there like this:
dict(dica_a.items() - dict_b.items())
But that tells me dicts are unhashable. Trying this method:
dict_c = {k:dict_a[k] for k in dict_a if k not in dict_b}
Leaves me with an empty dictionary. I also tried this:
for k, v in dict_b.items():
if (k, v) in dict_a.itemS():
dict_a.pop(k, v)
But again, no luck there. It ended up not modifying dict_a at all.
additional_key_to_remove = []
for key, value in dict_b.items():
if isinstance(dict_a[key], dict) and isinstance(
value, dict
): # to makes sure that operation are happening on dictionary and not on any other datastructures
sub_dict = dict_a[key]
for k in value:
sub_dict.pop(k, None)
elif isinstance(dict_a[key], str) and isinstance(
value, str
): # collect the non nested keys for later removal
additional_key_to_remove.append(key)
for key in additional_key_to_remove:
del dict_a[key]
print(dict_a)
Output:
{'first_key': {'second_key': 'value1'}}

Efficiently Perform Nested Dictionary Lookups and List Appending Using Numpy Nonzero Indices

I have working code to perform a nested dictionary lookup and append results of another lookup to each key's list using the results of numpy's nonzero lookup function. Basically, I need a list of strings appended to a dictionary. These strings and the dictionary's keys are hashed at one point to integers and kept track of using separate dictionaries with the integer hash as the key and the string as the value. I need to look up these hashed values and store the string results in the dictionary. It's confusing so hopefully looking at the code helps. Here's a simplified version of code:
for key in ResultDictionary:
ResultDictionary[key] = []
true_indices = np.nonzero(numpy_array_of_booleans)
for idx in range(0, len(true_indices[0])):
ResultDictionary.get(HashDictA.get(true_indices[0][idx])).append(HashDictB.get(true_indices[1][idx]))
This code works for me, but I am hoping there's a way to improve the efficiency. I am not sure if I'm limited due to the nested lookup. The speed is also dependent on the number of true results returned by the nonzero function. Any thoughts on this? Appreciate any suggestions.
Here are two suggestions:
1) since your hash dicts are keyed with ints it might help to transform them into arrays or even lists for faster lookup if that is an option.
k, v = map(list, (HashDictB.keys(), HashDictB.values())
mxk, mxv = max(k), max(v, key=len)
lookupB = np.empty((mxk+1,), dtype=f'U{mxv}')
lookupB[k] = v
2) you probably can save a number of lookups in ResultDictionary and HashDictA by processing your numpy_array_of_booleans row-wise:
i, j = np.where(numpy_array_of_indices)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
ResultDict = {HashDictA[i[l]]: [HashDictB[jj] for jj in j[l:r]] for l, r in zip(bnds[:-1], bnds[1:])}
2b) if for some reason you need to incrementally add associations you could do something like (I'll shorten variable names for that)
from operator import itemgetter
res = {}
def add_batch(data, res, hA, hB):
i, j = np.where(data)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
for l, r in zip(bnds[:-1], bnds[1:]):
if l+1 == r:
res.setdefault(hA[i[l]], set()).add(hB[j[l]])
else:
res.setdefault(hA[i[l]], set()).update(itemgetter(*j[l:r])(hB))
You can't do much about the dictionary lookups - you have to do those one at a time.
You can clean up the array indexing a bit:
idxes = np.argwhere(numpy_array_of_booleans)
for i,j in idxes:
ResultDictionary.get(HashDictA.get(i)).append(HashDictB.get(j)
argwhere is transpose(nonzero(...)), turning the tuple of arrays into a (n,2) array of index pairs. I don't think this makes a difference in speed, but the code is cleaner.

python iterate over dictionary emptys it

I have some code I'm analyzing. But I've found that iterating over a dictionary empties it. I've fixed the problem by by making a deepcopy of the dictionary and iterating over that in some code that displays the values, then later use the original dictionary to iterate over that to assign values to a 2D array. Why does iterating over the original dictionary to display it empty it, so that later use of the dictionary is unusable since it is now empty? Any replies welcome.
import copy
# This line fixed the problem
trans = copy.deepcopy(transitions)
print ("\nTransistions = ")
# Original line was:
# for state, next_states in transitions.items():
# Which empties the dictionary, so not usable after that
for state, next_states in trans.items():
for i in next_states:
print("\nstate = ", state, " next_state = ", i)
# Later code which with original for loop showed empty dictionary
for state, next_states in transitions.items():
for next_state in next_states:
print("\n one_step trans state = ", state, " next_state = ", next_state)
one_step[state,next_state] += 1
A print of the dictionary:
Transistions =
{0: <map object at 0x0000000003391550>, 1: <map object at 0x00000000033911D0>, 2: <map object at 0x0000000003391400>, 3: <map object at 0x00000000033915F8>, 4: <map object at 0x0000000003391320>}
Type:
Transistions =
<class 'dict'>
Edit: Here's the code that uses map. Any suggestions on how to edit it to created the dictionary without using map?
numbers = dict((state_set, n) for n, state_set in enumerate(sets))
transitions = {}
for state_set, next_sets in state_set_transitions.items():
dstate = numbers[state_set]
transitions[dstate] = map(numbers.get, next_sets)
Iterating over a dict doesn't empty it. Iterating over a map iterator empties it.
Wherever you generated your transitions dict, you should have used a list comprehension instead of map to create lists instead of iterators for the values:
[whatever for x in thing]
instead of
map(lambda x: whatever, thing)

How to get keys from nested dictionary of arbitrary length in Python

I have a dictionary object in python. Let's call it as dict. This object could contain another dictionary which may in turn contain another dictionary and so on.
dict = { 'k': v, 'k1': v1, 'dict2':{'k3': v3, 'k4':v4} , 'dict3':{'k5':v5, dict4:{'k6':v6}}}
This is just an example. Length of outermost dictionary could be anything. I want to extract keys from such dictionary object in following two ways :
get list of only keys.
[k,k1,k2,k3,k4,k5,k6]
get list of keys and its parent associated dictionary so something like this :
outer_dict_keys = [k ,dict2, dict3]
dict2_keys = [k3,k4]
dict3_keys = [k5, dict4]
dict4_keys = [k6]
Outermost dictionary dict length is always changing so I can not hard code anything.
What is best way to achieve above result ?
Use a mix of iteration and tail recursion. After quoting undefined names, making spacing uniform, and removing 'k2' from the first result, I came up with the code below. (Written and tested for 3.4, it should run on any 3.x and might on 2.7.) A key thing to remember is that the iteration order of dicts is essentially random, and varies with each run. Recursion as done here visit sub-dicts in depth-first rather than breadth-first order. For dict0, both are the same, But if dict4 were nested in dict2 rather than dict3, they would not be.
dict0 = {'k0': 0, 'k1': 1, 'dict2':{'k3': 3, 'k4': 4},
'dict3':{'k5': 5, 'dict4':{'k6': 6}}}
def keys(dic, klist=[]):
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append(val)
else:
klist.append(key)
for subdict in subdics:
keys(subdict, klist)
return klist
result = keys(dict0)
print(result, '\n', result == ['k0','k1','k3','k4','k5','k6'])
def keylines(dic, name='outer_dict', lines=[]):
vals = []
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append((key,val))
else:
vals.append(key)
vals.extend(pair[0] for pair in subdics)
lines.append('{}_keys = {}'.format(name, vals))
for subdict in subdics:
keylines(subdict[1], subdict[0], lines)
return lines
result = keylines(dict0)
for line in result:
print(line,)
print()
expect = [
"outer_dict_keys = ['k0', 'k1', 'dict2', 'dict3']",
"dict2_keys = ['k3', 'k4']",
"dict3_keys = ['k5', 'dict4']",
"dict4_keys = ['k6']"]
for actual, want in zip(result, expect):
if actual != want:
print(want)
for i, (c1, c2) in enumerate(zip(actual, want)):
if c1 != c2:
print(i, c1, c2)

Is it possible to delete keys meeting some criterion using a simple iteration in Python3?

Suppose we have a dict d={"key1":-1,"key2":-2,"key3":3,"key4":0,"key5":-7,"key6":1,...} in python3. Now I want to delete keys whose value is negative, e.g.,"key1":-1,"key2":-2,etc. I tried to write a code like this:
for k in d:
if d[k]<0:
del d[k]
But I received error saying "RuntimeError: dictionary changed size during iteration". From this message, it seems that it is not possible to delete keys of a dictionary meeting some criterion using a simple iteration, so at the moment, I have to save the keys to be deleted in a list, then write another iteration to remove them from d. My question is: is it really impossible to remove some of keys using a single iteration? If it's possible, could you please give a sample code of Python3 that can remove keys meeting some criterion using a simple iteration in Python3? Thank you.
Method #1: use a dictionary comprehension. This doesn't delete so much as replace, but gets you to the same d.
>>> d = {"key1":-1,"key2":-2,"key3":3,"key4":0,"key5":-7}
>>> d = {k: v for k,v in d.items() if v >= 0}
>>> d
{'key3': 3, 'key4': 0}
Method #2: iterate over an independent copy of the keys:
>>> d = {"key1":-1,"key2":-2,"key3":3,"key4":0,"key5":-7}
>>> for k in set(d):
... if d[k] < 0:
... del d[k]
...
>>> d
{'key3': 3, 'key4': 0}
Iterate over the keys instead of the dict:
for k in d.keys():
if d[k]<0:
del d[k]
For this to work in Python 3.X, keys() returns an iterator, so you need to use the following first line:
for k in list(d.keys()):

Resources