Need to turn a list of words into a dictionary using dictionary comprehension. The keys would be the length of the words, and the values are the set of words from the original list whose length are that of its keys.
I am able to create a regular function for this purpose, but unable to do so in one dictionary comprehension.
As an example, I created a list of names of different lengths.
word_list = ["Amber", "Steven", "Carol", "Tuan", "Michael", "sam", "Wayne", "Anna", "Kay", "Jim", "D", "Belinda", "CharlieYu"]
def word_lengths_notit(input_list):
wl_dict = {}
for word in input_list:
if len(word) not in wl_dict.keys():
wl_dict[len(word)]=[] #create key that is the length of the word
wl_dict[len(word)].append(word.lower())
else:
if word.lower() not in wl_dict[len(word)]:
wl_dict[len(word)].append(word.lower())
print(wl_dict)
word_lengths_notit(word_list)
My output:
{5: ['amber', 'carol', 'wayne'], 6: ['steven'], 4: ['tuan', 'anna'], 7: ['michael', 'belinda'], 3: ['sam', 'kay', 'jim'], 1: ['d'], 9: ['charlieyu']}
This might not be the cleanest/most efficient code (I just started learning two weeks ago), but the output is correct.
Below, I tried dictionary comprehension, but it keeps overwriting my previous values instead of appending to it. I'm thinking I might have to use a list comprehension to collect all the words of the same length, but I'm not sure how (or if I can) create multiple lists of words of different lengths in one list comprehension.
def word_lengths(input_list):
wl_dict = {len(word):word for word in input_list]
print(wl_dict)
word_lengths(word_list)
Output: {5: 'Wayne', 6: 'Steven', 4: 'Anna', 7: 'Belinda', 3: 'Jim', 1: 'D', 9: 'CharlieYu'}
So you're looking to make a dict where each key is an integer and each value is a list, and you're looking to do it via dict comprehension. My advice for doing it in vanilla python is to simply nest a list comprehension (to filter words by name) inside of the dict comprehension:
word_list = ["Amber", "Steven", "Carol", "Tuan", "Michael", "sam", "Wayne", "Anna", "Kay", "Jim", "D", "Belinda", "CharlieYu"]
word_lengths = {n: [word for word in word_list if len(word) == n]
for n in range(10)}
If you want to avoid cases like 0: [], you could throw a ternary if clause on the end there to filter them out (e.g. if len([word for word in word_list if len(word) == n])). Alternatively, you could simply make a set of all the unique lengths that are present and iterate over that:
word_list = ["Amber", "Steven", "Carol", "Tuan", "Michael", "sam", "Wayne", "Anna", "Kay", "Jim", "D", "Belinda", "CharlieYu"]
possible_lengths = set([len(word) for word in word_list])
word_lengths = {n: [word for word in word_list if len(word) == n]
for n in possible_lengths}
The above code outputs the following on my machine:
>>> print(word_lengths)
{1: ['D'], 3: ['sam', 'Kay', 'Jim'], 4: ['Tuan', 'Anna'], 5: ['Amber', 'Carol', 'Wayne'], 6: ['Steven'], 7: ['Michael', 'Belinda'], 9: ['CharlieYu']}
Note that this solution is O(n^2) complexity. Look into the collections library, which almost certainly has some clever things in it you can do to get a faster solution.
Related
I have a list of strings like this:
lst = ["this", "that", "cat", "dog", "crocodile", "blah"]
And I have another list of integers like:
index_nr = [2,3]
My goal is to take the numbers from index_nr and get the list items from lst with the corresponding index number. If we stick to the example above, the desired output would be
['cat', 'dog']
Given that, 0: "this", 1: "that", 2: "cat", 3: "dog", 4: "crocodile", and 5: "blah".
I know that:
print(lst[2:4])
would throw the desired output, but I'm not sure how to use the values in index_nr to achive the same outcome.
You can use list comprehension:
lst = ["this", "that", "cat", "dog", "crocodile", "blah"]
index_nr = [2, 3]
out = [lst[index] for index in index_nr]
print(out)
Prints:
['cat', 'dog']
Or standard for-loop:
out = []
for index in index_nr:
out.append(lst[index])
print(out)
You could index lst inside index_nr like this: index_nr = [lst[2],lst[3]]
I want to sort a list of words by word length but maintain the original order of words with same length (stable sorting).
For example:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
Should become
words = ['py', 'dog', 'cat', 'girl', 'book', 'hello']
I know this can be done easily with python's sorted function, which is stable:
sorted(words, key = lambda l:len(l))
But how would one do this efficiently without access to the library? I've been trying to brainstorm a solution and I'm thinking
Create a dict that will contain the index (key) and length (value) of each word in words
Maybe use a Counter to track how many instances of each length word there is (for this ex., would look like Counter({3: 2, 5: 1, 4: 2, 2: 1}))
Go through the dict looking for the minimum length of a word in counter (2 in my ex), and keep doing that for amt of times that length appears
Step 3 but moving onto next minimum in Counter
But this seems super inefficient. Could someone help with a better implementation?
You can just use a nomral bubble sorting technique for this.
Algo
Loop from 0 to len(word)-1
Nested loop from i+1 to len(words)
compare if words[i] > words[j]
Do swap, words[i] = words[j]
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
# first loop
for i in range(len(words)-1):
# second loop
for j in range(i+1, len(words)):
# swapping if first word is bigger than the second
if len(words[i]) > len(words[j]):
temp = words[i]
words[i] = words[j]
words[j] = temp
print(words)
Output
['py', 'cat', 'dog', 'girl', 'book', 'hello']
This solves your problem in the time complexity of O(N*N). So kind of costly. But you can use it for a inputs of the length till 10^3
One solution, is to make dictionary buckets, where keys are length of the words (integer) and values are lists storing words of the same length. You can construct this dict in linear time.
Then traverse the dictionary from 0 to max_bucket, where max_bucket is maximal lenght of observed word:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
buckets = {}
max_bucket = -1
for w in words:
buckets.setdefault(len(w), []).append(w)
if len(w) > max_bucket:
max_bucket = len(w)
out = [w for i in range(max_bucket+1) for w in buckets.get(i, [])]
print(out)
Prints:
['py', 'dog', 'cat', 'girl', 'book', 'hello']
I am trying to put them into three groups of two people each. I'm not sure where I need to go from here. I don't know how I can add two people to a group efficiently.
so it should look like this
group_one = {'2': dan, '8': tom}
group_two = {'10': james, '12': emily}
group_three = {'7': kim , '13': jones}
You could slice your list within dict comprehension to create each variable:
group_one = {i[1]:i[0] for i in my_list[0:2]}
group_two = {i[1]:i[0] for i in my_list[2:4]}
group_three = {i[1]:i[0] for i in my_list[4:6]}
>>> group_one
{2: 'dan', 8: 'tom'}
>>> group_two
{10: 'james', 12: 'emily'}
>>> group_three
{7: 'kim', 13: 'jones'}
#Sacul code is fine. But if you want an even smaller solution do:
>>> my_list = [['dan',2],['tom',8],['james',10],['emily',12],['kim',7],['jones',13]]
>>> teams = [{i[1] : i[0] for i in my_list[n:n+2]} for n in range(0, len(my_list), 2)]
[{2: 'dan', 8: 'tom'}, {10: 'james', 12: 'emily'}, {7: 'kim', 13: 'jones'}]
Now your teams are split and stored on a list of dictionaries.
To have them on different variables like group_one, group_two and group_three use unpacking.
>>> group_one, group_two, group_three = teams # Or simply use the code of teams
I am using python 3.x,
I have 2 dictionaries (both very large but will substitute here). The values of the dictionaries contain more than one word:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}
I am looking for a way to compare the individual words contained in the values of dict_a to the words contained in the values of dict_b and return a dictionary or data-frame that contains the word, and the keys from dict_a and dict_b it is associated with:
My desired output (not formatted any certain way):
bear: key2 (from dict_a), keyZ(from dict_b)
Luxo: key3
orange: key2 (from dict_a), keyY (from dict_b)
I've got code that works for looking up a specific word in a single dictionary but it's not sufficient for what I need to accomplish here:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))
index now contains:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }
Update to comments
Since your actual dictionary is a mapping from string to list (and not string to string) change your loops to
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))
Since your lists have only one item you could just do
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))
If you have words that you don't want to be added to your index you can skip adding them to the index:
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
Then filter them out with
if word in words_to_skip:
continue
I noticed that you have some words surrounded by parenthesis (such as (342) and (221)). If you want to get rid the parenthesis do
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]
Putting this all together we get
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))
I think you can do what you want pretty easily. This code produces output in the format {word: {key: name_of_dict_the_key_is_in}}:
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result
You call it with the input dictionaries as keyword arguments. The keyword you use for each dictionary will be the string used to describe it in the output dictionary, so use something like search(dict_a=dict_a, dict_b=dict_b).
If your dictionaries might have some of the same keys, this code might not work right, since the keys could collide if they have the same words in their values. You could make the outer dict contain a list of (key, name) tuples, instead of an inner dictionary, I suppose. Just change the assignment line to result.setdefault(word, []).append((key, name)). That would be less handy to search in though.
I have a list of lists like this
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
len(list1) = 3
Now I would like to convert it into a list of string like this
list2 = ['I', 'am', 'a', 'student','I', 'come', 'from', 'China', 'I','study','computer','science']
len(list2) = 12
I am aware that I could conversion in this way
new_list = [','.join(x) for x in list1]
But it returns
['I,am,a,student','I,come,from,China','I,study,computer,science']
len(new_list) = 3
I also tried this
new_list = [''.join(x for x in list1)]
but it gives the following error
TypeError: sequence item 0: expected str instance, list found
How can I extract each word in the sublist of list1 and convert it into a list of string? I'm using python 3 in windows 7.
Following your edit, I think the most transparent approach is now the one that was adopted by another answer (an answer which has since been deleted, I think). I've added some whitespace to make it easier to understand what's going on:
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
list2 = [
word
for sublist in list1
for sentence in sublist
for word in sentence.split()
]
print(list2)
Prints:
['I', 'am', 'a', 'student', 'I', 'come', 'from', 'China', 'I', 'study', 'computer', 'science']
Given a list of lists where each sublist contain strings this could be solved using jez's strategy like:
list2 = ' '.join([' '.join(strings) for strings in list1]).split()
Where the list comprehension transforms list1 to a list of strings:
>>> [' '.join(strings) for strings in list1]
['I am a student', 'I come from China', 'I study computer science']
The join will then create a string from the strings and split will create a list split on spaces.
If the sublists only contain single strings, you could simplify the list comprehension:
list2 = ' '.join([l[0] for l in list1]).split()