How do I sort a list of dictionaries by a specific key's value? Given:
[{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
When sorted by name, it should become:
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
The sorted() function takes a key= parameter
newlist = sorted(list_to_be_sorted, key=lambda d: d['name'])
Alternatively, you can use operator.itemgetter instead of defining the function yourself
from operator import itemgetter
newlist = sorted(list_to_be_sorted, key=itemgetter('name'))
For completeness, add reverse=True to sort in descending order
newlist = sorted(list_to_be_sorted, key=itemgetter('name'), reverse=True)
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list will now be what you want.
Or better:
Since Python 2.4, there's a key argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter, but your mileage may vary.
If you want to sort the list by multiple keys, you can do the following:
my_list = [{'name':'Homer', 'age':39}, {'name':'Milhouse', 'age':10}, {'name':'Bart', 'age':10} ]
sortedlist = sorted(my_list , key=lambda elem: "%02d %s" % (elem['age'], elem['name']))
It is rather hackish, since it relies on converting the values into a single string representation for comparison, but it works as expected for numbers including negative ones (although you will need to format your string appropriately with zero paddings if you are using numbers).
a = [{'name':'Homer', 'age':39}, ...]
# This changes the list a
a.sort(key=lambda k : k['name'])
# This returns a new list (a is not modified)
sorted(a, key=lambda k : k['name'])
import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))
'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
Using the Schwartzian transform from Perl,
py = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
do
sort_on = "name"
decorated = [(dict_[sort_on], dict_) for dict_ in py]
decorated.sort()
result = [dict_ for (key, dict_) in decorated]
gives
>>> result
[{'age': 10, 'name': 'Bart'}, {'age': 39, 'name': 'Homer'}]
More on the Perl Schwartzian transform:
In computer science, the Schwartzian transform is a Perl programming
idiom used to improve the efficiency of sorting a list of items. This
idiom is appropriate for comparison-based sorting when the ordering is
actually based on the ordering of a certain property (the key) of the
elements, where computing that property is an intensive operation that
should be performed a minimal number of times. The Schwartzian
Transform is notable in that it does not use named temporary arrays.
Sometime we need to use lower() for case-insensitive sorting. For example,
lists = [{'name':'Homer', 'age':39},
{'name':'Bart', 'age':10},
{'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'])
print(lists)
# Bart, Homer, abby
# [{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}, {'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'].lower())
print(lists)
# abby, Bart, Homer
# [ {'name':'abby', 'age':9}, {'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
Using the Pandas package is another method, though its runtime at large scale is much slower than the more traditional methods proposed by others:
import pandas as pd
listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()
Here are some benchmark values for a tiny list and a large (100k+) list of dicts:
setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))
#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807
Here is the alternative general solution - it sorts elements of a dict by keys and values.
The advantage of it - no need to specify keys, and it would still work if some keys are missing in some of dictionaries.
def sort_key_func(item):
""" Helper function used to sort list of dicts
:param item: dict
:return: sorted list of tuples (k, v)
"""
pairs = []
for k, v in item.items():
pairs.append((k, v))
return sorted(pairs)
sorted(A, key=sort_key_func)
If you do not need the original list of dictionaries, you could modify it in-place with sort() method using a custom key function.
Key function:
def get_name(d):
""" Return the value of a key in a dictionary. """
return d["name"]
The list to be sorted:
data_one = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
Sorting it in-place:
data_one.sort(key=get_name)
If you need the original list, call the sorted() function passing it the list and the key function, then assign the returned sorted list to a new variable:
data_two = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
new_data = sorted(data_two, key=get_name)
Printing data_one and new_data.
>>> print(data_one)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
>>> print(new_data)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
Let's say I have a dictionary D with the elements below. To sort, just use the key argument in sorted to pass a custom function as below:
D = {'eggs': 3, 'ham': 1, 'spam': 2}
def get_count(tuple):
return tuple[1]
sorted(D.items(), key = get_count, reverse=True)
# Or
sorted(D.items(), key = lambda x: x[1], reverse=True) # Avoiding get_count function call
Check this out.
I have been a big fan of a filter with lambda. However, it is not best option if you consider time complexity.
First option
sorted_list = sorted(list_to_sort, key= lambda x: x['name'])
# Returns list of values
Second option
list_to_sort.sort(key=operator.itemgetter('name'))
# Edits the list, and does not return a new list
Fast comparison of execution times
# First option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" "sorted_l = sorted(list_to_sort, key=lambda e: e['name'])"
1000000 loops, best of 3: 0.736 µsec per loop
# Second option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" -s "import operator" "list_to_sort.sort(key=operator.itemgetter('name'))"
1000000 loops, best of 3: 0.438 µsec per loop
If performance is a concern, I would use operator.itemgetter instead of lambda as built-in functions perform faster than hand-crafted functions. The itemgetter function seems to perform approximately 20% faster than lambda based on my testing.
From https://wiki.python.org/moin/PythonSpeed:
Likewise, the builtin functions run faster than hand-built equivalents. For example, map(operator.add, v1, v2) is faster than map(lambda x,y: x+y, v1, v2).
Here is a comparison of sorting speed using lambda vs itemgetter.
import random
import operator
# Create a list of 100 dicts with random 8-letter names and random ages from 0 to 100.
l = [{'name': ''.join(random.choices(string.ascii_lowercase, k=8)), 'age': random.randint(0, 100)} for i in range(100)]
# Test the performance with a lambda function sorting on name
%timeit sorted(l, key=lambda x: x['name'])
13 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Test the performance with itemgetter sorting on name
%timeit sorted(l, key=operator.itemgetter('name'))
10.7 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Check that each technique produces the same sort order
sorted(l, key=lambda x: x['name']) == sorted(l, key=operator.itemgetter('name'))
True
Both techniques sort the list in the same order (verified by execution of the final statement in the code block), but the first one is a little faster.
As indicated by #Claudiu to #monojohnny in comment section of this answer, given:
list_to_be_sorted = [
{'name':'Homer', 'age':39},
{'name':'Milhouse', 'age':10},
{'name':'Bart', 'age':10}
]
to sort the list of dictionaries by key 'age', 'name'
(like in SQL statement ORDER BY age, name), you can use:
newlist = sorted( list_to_be_sorted, key=lambda k: (k['age'], k['name']) )
or, likewise
import operator
newlist = sorted( list_to_be_sorted, key=operator.itemgetter('age','name') )
print(newlist)
[{'name': 'Bart', 'age': 10}, {'name': 'Milhouse', 'age': 10},
{'name': 'Homer', 'age': 39}]
You can use the following:
lst = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
sorted_lst = sorted(lst, key=lambda x: x['age']) # change this to sort by a different field
print(sorted_lst)
sorting by multiple columns, while in descending order on some of them:
the cmps array is global to the cmp function, containing field names and inv == -1 for desc 1 for asc
def cmpfun(a, b):
for (name, inv) in cmps:
res = cmp(a[name], b[name])
if res != 0:
return res * inv
return 0
data = [
dict(name='alice', age=10),
dict(name='baruch', age=9),
dict(name='alice', age=11),
]
all_cmps = [
[('name', 1), ('age', -1)],
[('name', 1), ('age', 1)],
[('name', -1), ('age', 1)],]
print 'data:', data
for cmps in all_cmps: print 'sort:', cmps; print sorted(data, cmpfun)
I have a list name list1:
list1 = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
I want to make two new lists:
list1_1 = ['DC1', 'C4', 'C5']
list1_2 = ['DC3', 'C1', 'C2', 'C3']
can anyone please show me how to do?
thank you.
this can solve your problem. note: this is not an optimized one
yourlist = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
temp_dict = {}
for i in yourlist:
if i[0] not in temp_dict:
temp_dict.update({i[0]:[i[1]]})
else:
temp_dict[i[0]].append(i[1])
final_list =[]
for i,j in temp_dict.items():
temp_list =[i]
for k in j:
temp_list.append(k)
final_list.append(temp_list)
list1_1 = final_list[0]
list1_2 = final_list[1]
Output:
list1_1
['DC1', 'C4', 'C5']
list1_2
['DC3', 'C1', 'C2', 'C3']
Building on Tamil Selvan's answer, but using a defaultdict for simplicity and list concatenation instead of appends for the big list.
from collections import defaultdict
list1 = [['DC1', 'C4'], ['DC1', 'C5'], ['DC3', 'C1'], ['DC3', 'C2'], ['DC3', 'C3']]
# First we create a dict containing the left terms as keys and the right terms as values.
d = defaultdict(list)
for (key, value) in list1:
d[key].append(value)
print(d)
# {'DC1': ['C4', 'C5'],
# 'DC3': ['C1', 'C2', 'C3']}
# Then for each key, we create a list of values with the key as first item.
lists = []
for key, values in d.items():
sublist = [key, *values]
lists.append(sublist)
print(lists)
# [['DC1', 'C4', 'C5'],
# ['DC3', 'C1', 'C2', 'C3']]
# Finally, you can easily take the sublist that you want.
list1_1, list1_2 = lists
I am trying to create a method that sorts a list of variables into clumps of size four, with the same characters grouped together and in the same order as they are given. You may assume the only given characters are a, b, and c. For example, here I would like to sort myInitialList.
myInitialList = ['b1', 'c1', 'b2', 'c2', 'c3', 'b3', 'c4', 'a1', 'b4', 'b5', 'a2', 'c5', 'a3', 'a4', 'a5', 'c6', 'a6', 'a7', 'a8','a9']
endList = clumpsSize4(myInitialList)
print(endList)
This should output the result:
['a1','a2','a3','a4','b1','b2','b3','b4','c1','c2','c3','c4','a5','a6','a7','a8','b5','c5','c6','a9']
How do I write the clumpsSize4 method?
This is not the most efficient, but here is my attempt. Sort the input. Have one default dict groupNums which links a letter to the current number clump it is on. Have another default dict groups which contains the actual clumps. Sort the groups at the end, iterate over them and join:
from collections import defaultdict
def clump(l, size=4):
groups = defaultdict(list)
groupNums = defaultdict(int)
l = sorted(l)
for i in l:
letter = i[0]
key = str(groupNums[letter]) + letter
groups[key].append(i)
if len(groups[key]) == size:
groupNums[letter] += 1
result = []
for _, g in sorted(groups.items()):
result += g
return result
I am using python 3.x,
I have 2 dictionaries (both very large but will substitute here). The values of the dictionaries contain more than one word:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}
I am looking for a way to compare the individual words contained in the values of dict_a to the words contained in the values of dict_b and return a dictionary or data-frame that contains the word, and the keys from dict_a and dict_b it is associated with:
My desired output (not formatted any certain way):
bear: key2 (from dict_a), keyZ(from dict_b)
Luxo: key3
orange: key2 (from dict_a), keyY (from dict_b)
I've got code that works for looking up a specific word in a single dictionary but it's not sufficient for what I need to accomplish here:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))
index now contains:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }
Update to comments
Since your actual dictionary is a mapping from string to list (and not string to string) change your loops to
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))
Since your lists have only one item you could just do
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))
If you have words that you don't want to be added to your index you can skip adding them to the index:
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
Then filter them out with
if word in words_to_skip:
continue
I noticed that you have some words surrounded by parenthesis (such as (342) and (221)). If you want to get rid the parenthesis do
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]
Putting this all together we get
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))
I think you can do what you want pretty easily. This code produces output in the format {word: {key: name_of_dict_the_key_is_in}}:
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result
You call it with the input dictionaries as keyword arguments. The keyword you use for each dictionary will be the string used to describe it in the output dictionary, so use something like search(dict_a=dict_a, dict_b=dict_b).
If your dictionaries might have some of the same keys, this code might not work right, since the keys could collide if they have the same words in their values. You could make the outer dict contain a list of (key, name) tuples, instead of an inner dictionary, I suppose. Just change the assignment line to result.setdefault(word, []).append((key, name)). That would be less handy to search in though.