I am using python 3.x,
I have 2 dictionaries (both very large but will substitute here). The values of the dictionaries contain more than one word:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}
I am looking for a way to compare the individual words contained in the values of dict_a to the words contained in the values of dict_b and return a dictionary or data-frame that contains the word, and the keys from dict_a and dict_b it is associated with:
My desired output (not formatted any certain way):
bear: key2 (from dict_a), keyZ(from dict_b)
Luxo: key3
orange: key2 (from dict_a), keyY (from dict_b)
I've got code that works for looking up a specific word in a single dictionary but it's not sufficient for what I need to accomplish here:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))
index now contains:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }
Update to comments
Since your actual dictionary is a mapping from string to list (and not string to string) change your loops to
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))
Since your lists have only one item you could just do
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))
If you have words that you don't want to be added to your index you can skip adding them to the index:
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
Then filter them out with
if word in words_to_skip:
continue
I noticed that you have some words surrounded by parenthesis (such as (342) and (221)). If you want to get rid the parenthesis do
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]
Putting this all together we get
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))
I think you can do what you want pretty easily. This code produces output in the format {word: {key: name_of_dict_the_key_is_in}}:
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result
You call it with the input dictionaries as keyword arguments. The keyword you use for each dictionary will be the string used to describe it in the output dictionary, so use something like search(dict_a=dict_a, dict_b=dict_b).
If your dictionaries might have some of the same keys, this code might not work right, since the keys could collide if they have the same words in their values. You could make the outer dict contain a list of (key, name) tuples, instead of an inner dictionary, I suppose. Just change the assignment line to result.setdefault(word, []).append((key, name)). That would be less handy to search in though.
Related
I am trying to create a method that sorts a list of variables into clumps of size four, with the same characters grouped together and in the same order as they are given. You may assume the only given characters are a, b, and c. For example, here I would like to sort myInitialList.
myInitialList = ['b1', 'c1', 'b2', 'c2', 'c3', 'b3', 'c4', 'a1', 'b4', 'b5', 'a2', 'c5', 'a3', 'a4', 'a5', 'c6', 'a6', 'a7', 'a8','a9']
endList = clumpsSize4(myInitialList)
print(endList)
This should output the result:
['a1','a2','a3','a4','b1','b2','b3','b4','c1','c2','c3','c4','a5','a6','a7','a8','b5','c5','c6','a9']
How do I write the clumpsSize4 method?
This is not the most efficient, but here is my attempt. Sort the input. Have one default dict groupNums which links a letter to the current number clump it is on. Have another default dict groups which contains the actual clumps. Sort the groups at the end, iterate over them and join:
from collections import defaultdict
def clump(l, size=4):
groups = defaultdict(list)
groupNums = defaultdict(int)
l = sorted(l)
for i in l:
letter = i[0]
key = str(groupNums[letter]) + letter
groups[key].append(i)
if len(groups[key]) == size:
groupNums[letter] += 1
result = []
for _, g in sorted(groups.items()):
result += g
return result
I want to sort a list of words by word length but maintain the original order of words with same length (stable sorting).
For example:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
Should become
words = ['py', 'dog', 'cat', 'girl', 'book', 'hello']
I know this can be done easily with python's sorted function, which is stable:
sorted(words, key = lambda l:len(l))
But how would one do this efficiently without access to the library? I've been trying to brainstorm a solution and I'm thinking
Create a dict that will contain the index (key) and length (value) of each word in words
Maybe use a Counter to track how many instances of each length word there is (for this ex., would look like Counter({3: 2, 5: 1, 4: 2, 2: 1}))
Go through the dict looking for the minimum length of a word in counter (2 in my ex), and keep doing that for amt of times that length appears
Step 3 but moving onto next minimum in Counter
But this seems super inefficient. Could someone help with a better implementation?
You can just use a nomral bubble sorting technique for this.
Algo
Loop from 0 to len(word)-1
Nested loop from i+1 to len(words)
compare if words[i] > words[j]
Do swap, words[i] = words[j]
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
# first loop
for i in range(len(words)-1):
# second loop
for j in range(i+1, len(words)):
# swapping if first word is bigger than the second
if len(words[i]) > len(words[j]):
temp = words[i]
words[i] = words[j]
words[j] = temp
print(words)
Output
['py', 'cat', 'dog', 'girl', 'book', 'hello']
This solves your problem in the time complexity of O(N*N). So kind of costly. But you can use it for a inputs of the length till 10^3
One solution, is to make dictionary buckets, where keys are length of the words (integer) and values are lists storing words of the same length. You can construct this dict in linear time.
Then traverse the dictionary from 0 to max_bucket, where max_bucket is maximal lenght of observed word:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
buckets = {}
max_bucket = -1
for w in words:
buckets.setdefault(len(w), []).append(w)
if len(w) > max_bucket:
max_bucket = len(w)
out = [w for i in range(max_bucket+1) for w in buckets.get(i, [])]
print(out)
Prints:
['py', 'dog', 'cat', 'girl', 'book', 'hello']
d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?
You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}
I'm sure this has been asked and answered, but I cant find it. I have this dictionary:
{'22775': 15.9,
'22778': 29.2,
'22776': 20.25,
'22773': 9.65,
'22777': 22.9,
'22774': 12.45}
a string and a float.
I want to list the key strings in a tk listbox to allow the user to select one and then use the corresponding float in a calculation to determine a delay factor in an event.
I have this code:
def dic_entry(line):
#Create key:value pairs from string
key, sep, value = line.strip().partition(":")
return key, float(value)
with open(filename1) as f_obj:
s = dict(dic_entry(line) for line in f_obj)
print (s) #for testing only
s_ord = sorted(s.items(),key=lambda x: x[1])
print (s_ord)
The first print gets me
{'22775': 15.9,
'22778': 29.2,
'22776': 20.25,
'22773': 9.65,
'22777': 22.9,
'22774': 12.45}
as expected. The second, which I hoped would give me an ordered list of keys gets me
[('22773', 9.65),
('22774', 12.45),
('22775', 15.9),
('22776', 20.25),
('22777', 22.9),
('22778', 29.2)].
I have tried using sorteddictionary from the collections module and it gives me a sorted dictionary, but I'm having trouble extracting a list of keys.
s_ord2 = []
for keys in s.items():
s_ord2.append (keys)
print (s_ord2)
gives me a list of key value pairs:
[('22776', 20.25),
('22777', 22.9),
('22774', 12.45),
('22773', 9.65),
('22778', 29.2),
('22775', 15.9)]
I'm sure I'm doing something dumb, I just don't know what it is.
You're using items when you want to use keys:
In [1]: d = {'z': 3, 'b': 4, 'a': 9}
In [2]: sorted(d.keys())
Out[2]: ['a', 'b', 'z']
In [3]: sorted(d.items())
Out[3]: [('a', 9), ('b', 4), ('z', 3)]
d.items() gives you tuples of (key, value); d.keys() just gives you just the keys.
Let's say that I have a dictionary that contains the following:
myDict = {'A':[1,2], 'B': [4,5], 'C': [1,2]}
I want to create a new dictionary, merged that merges keys by having similar values, so my merged would be:
merged ={['A', 'C']:[1:2], 'B':[4,5]}
I have tried using the method suggested in this thread, but cannot replicate what I need.
Any suggestions?
What you have asked for is not possible. Your keys in the hypothetical dictionary use mutable lists. As mutable data can not be hashed, you cant use them as dictionary keys.
Edit, I had a go to doing what you asked for except the keys in this are all tuples. This code is a mess but you may be able to clean it up.
myDict = {'A':[1,2],
'B': [4,5],
'C': [1,2],
'D': [1, 2],
}
myDict2 = {k: tuple(v) for k, v in myDict.items()}
print(myDict2) #turn all vlaues into hasable tuples
#make set of unique keys
unique = {tuple(v) for v in myDict.values()}
print(unique) #{(1, 2), (4, 5)}
"""
iterate over each value and make a temp shared_keys list tracking for which
keys the values are found. Add the new key, vlaue pairs into a new
dictionary"""
new_dict = {}
for value in unique:
shared_keys = []
for key in myDict:
if tuple(myDict[key]) == value:
shared_keys.append(key)
new_dict[tuple(shared_keys)] = value
print(new_dict) #{('A', 'C'): (1, 2), ('B',): (4, 5)}
#change the values back into mutable lists from tuples
final_dict = {k: list(v) for k, v in new_dict.items()}
print(final_dict)