Out of order dictionary keys after merging in Python - python-3.x

After merging two dictionaries, the subkeys form the resulted dictionary are disorderly. Subkeys are months ['jan', 'feb', 'mar', ... , 'dec']. In some cases, an original dictionary might not contain a subkey (month) so the output gets disoredered with the merge.
So I have two dictionaries both with the next structure
{Model:{'Jan':[1], 'Feb':[2], Jun: [5], ...}
As you can see in this example, some subkeys (months) are not represented, so they can't be found it the original dicts. But, what I need is the merged dict to keep the monthly order, doesn't matter how original dicts were looking like.
The merging function:
def merge_dicts(dict1, dict2):
'''Marge two dicts by adding up (accumulating) values in each key.
Returns: A merge (addition) of two dictionaries by adding values of same keys
'''
# Merge dictionaries and add values of same keys
out = {**dict1, **dict2}
for key, value in out.items():
if key in dict1 and key in dict2:
out[key] = [value, dict1[key]]
#Things got harder, out[key] appends in list of list of list... no itertools can help here.
lst = [] #the one dimensional list to fix the problem of list of list with out[key]
for el in out[key]:
try:
#if inside out[key] there is a list of list we split it
for sub_el in el:
lst.append(sub_el)
except:
#if inside out[key] there is only a single float
lst.append(el)
#Replace the old key with the one dimensional list
out[key] = lst
return out
How I merge it:
for c in range(len([*CMs_selection.keys()])):
if c == 0:
#First merge, with dict0 & dict1
merged_dict = {cm:merge_dicts(CMs_selection[[*CMs_selection.keys()][c]][cm], CMs_selection[[*CMs_selection.keys()][c + 1]][cm])
for cm in CMs_selection[[*CMs_selection.keys()][0]]}
elif c > 0 and c < (len(years) - 1):
#Second merge to n merge, starting with dict_merged and dict 2
merged_dict = {cm:merge_dicts(merged_dict[cm], CMs_selection[[*CMs_selection.keys()][c + 1]][cm])
for cm in CMs_selection[[*CMs_selection.keys()][0]]}
Right now, after trying all the merging possible I am getting this results always.
{'Model1': {'Jan': [-0.0952586755156517,
0.1015196293592453,
-0.10572463274002075],
'Oct': [-0.02473766915500164,
0.0678798109292984,
0.08870666474103928,
-0.06378963589668274],
'Nov': [-0.08730728179216385,
0.013518977910280228,
0.023245899006724358,
-0.03917887806892395],
'Jul': [-0.07940272241830826, -0.04912888631224632, -0.07454635202884674],
'Dec': [-0.061335086822509766, -0.0033914903178811073, 0.09630533307790756],
'Mar': [0.029064208269119263, 0.11327305436134338, 0.009556809440255165],
'Apr': [-0.04433680325746536, -0.08620205521583557],
'Jun': [-0.036688946187496185, 0.05543896555900574, -0.07162825018167496],
'Aug': -0.03712410107254982,
'Sep': [0.007421047426760197, 0.008665643632411957],
'Feb': [-0.02879650704562664, 0.013025006279349327]},
'Model2': {'Feb': -0.05173473060131073,
'Jun': [-0.09126871824264526,
-0.09009774029254913,
0.10458160936832428,
-0.09445420652627945,
-0.04294373467564583],
'Aug': [-0.07917020469903946, 0.011026041582226753],
'Oct': [-0.10164830088615417, ....
....
With disorderly months. Please help me!

If we just focus on merging dictionaries, first, we need to define the normal order of months, then make the merging in that order because Python doesn't know this order. It cannot add "Mar" between "Feb" and "Apr" if it doesn't exist at the first dictionary. So, we need to define the order ourself.
Also, you need two different solution for merging float values and merging lists. I added mode parameter to my solution.
def merge_dicts(list_of_dicts, mode):
keys = set(key for d in list_of_dicts for key in d.keys())
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
ordered_keys = [month for month in months if month in keys]
out = {}
for key in ordered_keys:
out[key] = []
for d in list_of_dicts:
if key in d:
if mode == "append":
out[key].append(d[key])
elif mode == "extend":
out[key].extend(d[key])
return out
CMs_selection = {2006: {'Model1': {'Jan': -0.1, 'Oct': -0.063}, 'Model2': {'Feb': -0.051, 'Jun': -0.04, 'Oct': 0.07}, 'Model3': {'Mar': -0.030, 'Jun': 0.02, 'Aug': 0.0561,}, 'Model4': {'Feb': -0.026, 'Dec': -0.06}}, 2007: {'Model1': {'Jul': -0.07, 'Oct': 0.8, 'Nov': 0.38, 'Dec': 0.1}, 'Model2': {'Jun': -0.09, 'Aug': -0.079, 'Sep': -0.7}}}
for key in CMs_selection:
CMs_selection[key] = merge_dicts(CMs_selection[key].values(), "append")
print(CMs_selection)
result = merge_dicts(CMs_selection.values(), "extend")
print(result)
Output:
{2006: {'Jan': [-0.1], 'Feb': [-0.051, -0.026], 'Mar': [-0.03], 'Jun': [-0.04, 0.02], 'Aug': [0.0561], 'Oct': [-0.063, 0.07], 'Dec': [-0.06]}, 2007: {'Jun': [-0.09], 'Jul': [-0.07], 'Aug': [-0.079], 'Sep': [-0.7], 'Oct': [0.8], 'Nov': [0.38], 'Dec': [0.1]}}
{'Jan': [-0.1], 'Feb': [-0.051, -0.026], 'Mar': [-0.03], 'Jun': [-0.04, 0.02, -0.09], 'Jul': [-0.07], 'Aug': [0.0561, -0.079], 'Sep': [-0.7], 'Oct': [-0.063, 0.07, 0.8], 'Nov': [0.38], 'Dec': [-0.06, 0.1]}

Related

How can I use the sort method in Python to sort a list of dictionaries by a specific key in the dictionary? [duplicate]

How do I sort a list of dictionaries by a specific key's value? Given:
[{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
When sorted by name, it should become:
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
The sorted() function takes a key= parameter
newlist = sorted(list_to_be_sorted, key=lambda d: d['name'])
Alternatively, you can use operator.itemgetter instead of defining the function yourself
from operator import itemgetter
newlist = sorted(list_to_be_sorted, key=itemgetter('name'))
For completeness, add reverse=True to sort in descending order
newlist = sorted(list_to_be_sorted, key=itemgetter('name'), reverse=True)
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list will now be what you want.
Or better:
Since Python 2.4, there's a key argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter, but your mileage may vary.
If you want to sort the list by multiple keys, you can do the following:
my_list = [{'name':'Homer', 'age':39}, {'name':'Milhouse', 'age':10}, {'name':'Bart', 'age':10} ]
sortedlist = sorted(my_list , key=lambda elem: "%02d %s" % (elem['age'], elem['name']))
It is rather hackish, since it relies on converting the values into a single string representation for comparison, but it works as expected for numbers including negative ones (although you will need to format your string appropriately with zero paddings if you are using numbers).
a = [{'name':'Homer', 'age':39}, ...]
# This changes the list a
a.sort(key=lambda k : k['name'])
# This returns a new list (a is not modified)
sorted(a, key=lambda k : k['name'])
import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))
'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
Using the Schwartzian transform from Perl,
py = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
do
sort_on = "name"
decorated = [(dict_[sort_on], dict_) for dict_ in py]
decorated.sort()
result = [dict_ for (key, dict_) in decorated]
gives
>>> result
[{'age': 10, 'name': 'Bart'}, {'age': 39, 'name': 'Homer'}]
More on the Perl Schwartzian transform:
In computer science, the Schwartzian transform is a Perl programming
idiom used to improve the efficiency of sorting a list of items. This
idiom is appropriate for comparison-based sorting when the ordering is
actually based on the ordering of a certain property (the key) of the
elements, where computing that property is an intensive operation that
should be performed a minimal number of times. The Schwartzian
Transform is notable in that it does not use named temporary arrays.
Sometime we need to use lower() for case-insensitive sorting. For example,
lists = [{'name':'Homer', 'age':39},
{'name':'Bart', 'age':10},
{'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'])
print(lists)
# Bart, Homer, abby
# [{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}, {'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'].lower())
print(lists)
# abby, Bart, Homer
# [ {'name':'abby', 'age':9}, {'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
Using the Pandas package is another method, though its runtime at large scale is much slower than the more traditional methods proposed by others:
import pandas as pd
listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()
Here are some benchmark values for a tiny list and a large (100k+) list of dicts:
setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))
#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807
Here is the alternative general solution - it sorts elements of a dict by keys and values.
The advantage of it - no need to specify keys, and it would still work if some keys are missing in some of dictionaries.
def sort_key_func(item):
""" Helper function used to sort list of dicts
:param item: dict
:return: sorted list of tuples (k, v)
"""
pairs = []
for k, v in item.items():
pairs.append((k, v))
return sorted(pairs)
sorted(A, key=sort_key_func)
If you do not need the original list of dictionaries, you could modify it in-place with sort() method using a custom key function.
Key function:
def get_name(d):
""" Return the value of a key in a dictionary. """
return d["name"]
The list to be sorted:
data_one = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
Sorting it in-place:
data_one.sort(key=get_name)
If you need the original list, call the sorted() function passing it the list and the key function, then assign the returned sorted list to a new variable:
data_two = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
new_data = sorted(data_two, key=get_name)
Printing data_one and new_data.
>>> print(data_one)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
>>> print(new_data)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
Let's say I have a dictionary D with the elements below. To sort, just use the key argument in sorted to pass a custom function as below:
D = {'eggs': 3, 'ham': 1, 'spam': 2}
def get_count(tuple):
return tuple[1]
sorted(D.items(), key = get_count, reverse=True)
# Or
sorted(D.items(), key = lambda x: x[1], reverse=True) # Avoiding get_count function call
Check this out.
I have been a big fan of a filter with lambda. However, it is not best option if you consider time complexity.
First option
sorted_list = sorted(list_to_sort, key= lambda x: x['name'])
# Returns list of values
Second option
list_to_sort.sort(key=operator.itemgetter('name'))
# Edits the list, and does not return a new list
Fast comparison of execution times
# First option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" "sorted_l = sorted(list_to_sort, key=lambda e: e['name'])"
1000000 loops, best of 3: 0.736 µsec per loop
# Second option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" -s "import operator" "list_to_sort.sort(key=operator.itemgetter('name'))"
1000000 loops, best of 3: 0.438 µsec per loop
If performance is a concern, I would use operator.itemgetter instead of lambda as built-in functions perform faster than hand-crafted functions. The itemgetter function seems to perform approximately 20% faster than lambda based on my testing.
From https://wiki.python.org/moin/PythonSpeed:
Likewise, the builtin functions run faster than hand-built equivalents. For example, map(operator.add, v1, v2) is faster than map(lambda x,y: x+y, v1, v2).
Here is a comparison of sorting speed using lambda vs itemgetter.
import random
import operator
# Create a list of 100 dicts with random 8-letter names and random ages from 0 to 100.
l = [{'name': ''.join(random.choices(string.ascii_lowercase, k=8)), 'age': random.randint(0, 100)} for i in range(100)]
# Test the performance with a lambda function sorting on name
%timeit sorted(l, key=lambda x: x['name'])
13 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Test the performance with itemgetter sorting on name
%timeit sorted(l, key=operator.itemgetter('name'))
10.7 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Check that each technique produces the same sort order
sorted(l, key=lambda x: x['name']) == sorted(l, key=operator.itemgetter('name'))
True
Both techniques sort the list in the same order (verified by execution of the final statement in the code block), but the first one is a little faster.
As indicated by #Claudiu to #monojohnny in comment section of this answer, given:
list_to_be_sorted = [
{'name':'Homer', 'age':39},
{'name':'Milhouse', 'age':10},
{'name':'Bart', 'age':10}
]
to sort the list of dictionaries by key 'age', 'name'
(like in SQL statement ORDER BY age, name), you can use:
newlist = sorted( list_to_be_sorted, key=lambda k: (k['age'], k['name']) )
or, likewise
import operator
newlist = sorted( list_to_be_sorted, key=operator.itemgetter('age','name') )
print(newlist)
[{'name': 'Bart', 'age': 10}, {'name': 'Milhouse', 'age': 10},
{'name': 'Homer', 'age': 39}]
You can use the following:
lst = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
sorted_lst = sorted(lst, key=lambda x: x['age']) # change this to sort by a different field
print(sorted_lst)
sorting by multiple columns, while in descending order on some of them:
the cmps array is global to the cmp function, containing field names and inv == -1 for desc 1 for asc
def cmpfun(a, b):
for (name, inv) in cmps:
res = cmp(a[name], b[name])
if res != 0:
return res * inv
return 0
data = [
dict(name='alice', age=10),
dict(name='baruch', age=9),
dict(name='alice', age=11),
]
all_cmps = [
[('name', 1), ('age', -1)],
[('name', 1), ('age', 1)],
[('name', -1), ('age', 1)],]
print 'data:', data
for cmps in all_cmps: print 'sort:', cmps; print sorted(data, cmpfun)

How to get a dict from a list of dict according to some threshold

I have a list of dicts like the one below:
list_dict = [{'Name': 'Andres', 'score': 0.17669814825057983},
{'Name': 'Paul', 'score': 0.14028045535087585},
{'Name': 'Feder', 'score': 0.1379694938659668},
{'Name': 'James', 'score': 0.1348174512386322}]
I want to output another list of dict but only when sum of score is higher than a threshold=0.15
Expected output: [{'name':'Andres', 'score' : 0.1766..}]
I did this, but the code is terrible and the outuput is wrongly formatted
l = []
for i in range(len(list_dict)):
for k in list_dict[i]['name']:
if list_dict[i]['score']>0.15:
print(k)
Maybe this is what you're looking?
Actually you're very close... but just miss a few syntax.
Each item in list_dict is a dictionary, so you can access and ask the score, it should not use index to get the interesting part.
new_dc = list()
for item in list_dict: # each item is a dictionary
if item['score'] > 0.15: # it's better to use a meaningful variable.
new_dc.append(item)
print(new_dc) # [{'Name': 'Andres', 'score': 0.17669814825057983}]
Alternatively you can use List Comprehension:
output = [item for item in list_dict if item['score'] > 0.15]
assert new_dc == output # Silence mean they're the same
1st approach using loop
final_list = []
for each in list_dict: #simply iterate through each dict in list and compare score
if each['score']>0.15:
final_list.append(each)
print(final_list)
2nd approach using list comprehension
final_list = [item for item in list_dict if item['score']>0.15]
print(final_list)

make two new lists from the current list

I have a list name list1:
list1 = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
I want to make two new lists:
list1_1 = ['DC1', 'C4', 'C5']
list1_2 = ['DC3', 'C1', 'C2', 'C3']
can anyone please show me how to do?
thank you.
this can solve your problem. note: this is not an optimized one
yourlist = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
temp_dict = {}
for i in yourlist:
if i[0] not in temp_dict:
temp_dict.update({i[0]:[i[1]]})
else:
temp_dict[i[0]].append(i[1])
final_list =[]
for i,j in temp_dict.items():
temp_list =[i]
for k in j:
temp_list.append(k)
final_list.append(temp_list)
list1_1 = final_list[0]
list1_2 = final_list[1]
Output:
list1_1
['DC1', 'C4', 'C5']
list1_2
['DC3', 'C1', 'C2', 'C3']
Building on Tamil Selvan's answer, but using a defaultdict for simplicity and list concatenation instead of appends for the big list.
from collections import defaultdict
list1 = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
# First we create a dict containing the left terms as keys and the right terms as values.
d = defaultdict(list)
for (key, value) in list1:
d[key].append(value)
print(d)
# {'DC1': ['C4', 'C5'],
# 'DC3': ['C1', 'C2', 'C3']}
# Then for each key, we create a list of values with the key as first item.
lists = []
for key, values in d.items():
sublist = [key, *values]
lists.append(sublist)
print(lists)
# [['DC1', 'C4', 'C5'],
# ['DC3', 'C1', 'C2', 'C3']]
# Finally, you can easily take the sublist that you want.
list1_1, list1_2 = lists

How to sort list into clumps of given size?

I am trying to create a method that sorts a list of variables into clumps of size four, with the same characters grouped together and in the same order as they are given. You may assume the only given characters are a, b, and c. For example, here I would like to sort myInitialList.
myInitialList = ['b1', 'c1', 'b2', 'c2', 'c3', 'b3', 'c4', 'a1', 'b4', 'b5', 'a2', 'c5', 'a3', 'a4', 'a5', 'c6', 'a6', 'a7', 'a8','a9']
endList = clumpsSize4(myInitialList)
print(endList)
This should output the result:
['a1','a2','a3','a4','b1','b2','b3','b4','c1','c2','c3','c4','a5','a6','a7','a8','b5','c5','c6','a9']
How do I write the clumpsSize4 method?
This is not the most efficient, but here is my attempt. Sort the input. Have one default dict groupNums which links a letter to the current number clump it is on. Have another default dict groups which contains the actual clumps. Sort the groups at the end, iterate over them and join:
from collections import defaultdict
def clump(l, size=4):
groups = defaultdict(list)
groupNums = defaultdict(int)
l = sorted(l)
for i in l:
letter = i[0]
key = str(groupNums[letter]) + letter
groups[key].append(i)
if len(groups[key]) == size:
groupNums[letter] += 1
result = []
for _, g in sorted(groups.items()):
result += g
return result

Comparing like words between two dictionaries

I am using python 3.x,
I have 2 dictionaries (both very large but will substitute here). The values of the dictionaries contain more than one word:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}
I am looking for a way to compare the individual words contained in the values of dict_a to the words contained in the values of dict_b and return a dictionary or data-frame that contains the word, and the keys from dict_a and dict_b it is associated with:
My desired output (not formatted any certain way):
bear: key2 (from dict_a), keyZ(from dict_b)
Luxo: key3
orange: key2 (from dict_a), keyY (from dict_b)
I've got code that works for looking up a specific word in a single dictionary but it's not sufficient for what I need to accomplish here:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))
index now contains:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }
Update to comments
Since your actual dictionary is a mapping from string to list (and not string to string) change your loops to
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))
Since your lists have only one item you could just do
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))
If you have words that you don't want to be added to your index you can skip adding them to the index:
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
Then filter them out with
if word in words_to_skip:
continue
I noticed that you have some words surrounded by parenthesis (such as (342) and (221)). If you want to get rid the parenthesis do
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]
Putting this all together we get
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))
I think you can do what you want pretty easily. This code produces output in the format {word: {key: name_of_dict_the_key_is_in}}:
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result
You call it with the input dictionaries as keyword arguments. The keyword you use for each dictionary will be the string used to describe it in the output dictionary, so use something like search(dict_a=dict_a, dict_b=dict_b).
If your dictionaries might have some of the same keys, this code might not work right, since the keys could collide if they have the same words in their values. You could make the outer dict contain a list of (key, name) tuples, instead of an inner dictionary, I suppose. Just change the assignment line to result.setdefault(word, []).append((key, name)). That would be less handy to search in though.

Resources