What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)
Given the names and grades for each student in a Physics class of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.
Note: If there are multiple students with the same grade, order their names alphabetically and print each name on a new line.
Input Format
The first line contains an integer, , the number of students.
The subsequent lines describe each student over lines; the first line contains a student's name, and the second line contains their grade.
Constraints
There will always be one or more students having the second lowest grade.
Output Format
Print the name(s) of any student(s) having the second lowest grade in Physics; if there are multiple students, order their names alphabetically and print each one on a new line.
This is my code:
list = []
for _ in range(int(input())):
name = input()
score = float(input())
new = [name, score]
list.append(new)
def snd_highest(val):
return val[1]
list.sort(key = snd_highest)
list.sort()
value = list[1]
grade = value[1]
for a,b in list:
if b == grade:
print (a)
This is the test case:
4
Rachel
-50
Mawer
-50
Sheen
-50
Shaheen
51
And the expected output is Shaheen but i got the other 3.
Please explain.
To find the second lowest value, you have actually just sorted your list in ascending order and just taken the second value in the list by using the below code
value = list[1]
grade = value[1]
Imagine this is your list after sorting:
[['Sheen', 50.0], ['mawer', 50.0], ['rachel', 50.0], ['shaheen', 51.0]]
According to value = list[1], the program chooses "value = ['mawer', 50.0]".
Then the rest of your program takes the grade from this value and outputs the corresponding name, that's why this isn't working as per what you need, you need to write logic to find the lowest value and then find the second lowest, this current program just assumes the lowest value is in the second position in the list.
Try doing this
if __name__ == '__main__':
students = []
for _ in range(int(input())):
name = input()
score = float(input())
new = [name, score]
students.append(new)
def removeMinimum(oldlist):
oldlist = sorted(oldlist, key=lambda x: x[1])
min_ = min(students, key=lambda x: x[1])
newlist = []
for a in range(0, len(oldlist)):
if min_[1] != oldlist[a][1]:
newlist.append(oldlist[a])
return newlist
students = removeMinimum(students);
# find the second minimum value
min_ = min(students, key=lambda x: x[1])
# sort alphabetic order
students = sorted(students, key=lambda x: x[0])
for a in range(0, len(students)):
if min_[1] == students[a][1]:
print(students[a][0])
I hope this may help you to pass all your test cases. Thank you.
# These functions will be used for sorting
def getSecond(ele):
return ele[1]
def getFirst(ele):
return ele[0]
studendList = []
sortedList = []
secondLowestStudents = []
# Reading input from STDIN and saving in nested list [["stud1": <score>], ["stud2", <score>]]
for _ in range(int(input())):
name = input()
score = float(input())
studendList.append([name, score])
# sort the list by score and save it in a new list studendList (remove the duplicate score as well - see, if x[1] not in sortedList)
studendList.sort(key=getSecond)
[sortedList.append(x[1]) for x in studendList if x[1] not in sortedList]
# Get the second lowest grade
secondLowest = sortedList[1]
# Now sort the origin list by the name fetch the student list having the secondLowest grade
studendList.sort(key=getFirst)
[secondLowestStudents.append(x[0]) for x in studendList if x[1] == secondLowest]
# Print the student's name having second-lowest grade
for st in secondLowestStudents:
print(st)
I'm currently writing a class called SMS_store(). In it, I have a method called delete.
Delete is simply supposed to make sure the user has given me a valid integer. If so, it's supposed to pop an item from the list.
class SMS_store():
def __init__(self):
self.__inbox = []
def delete(self, i):
if i >= len(self.__inbox):
return None
else:
self.__inbox.pop[i]
Whenever I run the code in my test program, I run into two errors at my delete stage:
1) if I type myInbox.delete(2) when there's only 2 items in the list, I get "list index out of range" and I though I was protected from that error. myInbox.delete(3) gives me None.
2) If I type myInbox.delete(1) when there's a valid index 1 in my list, it says global name 'msg' not defined. I don't get why I'm seeing that error.
Here's my full class code.
#SMS_store class
"""
Pre-condition: SMS_store class is instantiated in client code.
Post-condition: SMS_store class is instantiated.
"""
class SMS_store():
#Object instantiation
"""
Pre-conditon: SMS_store class is instantiated in client code.
Post-condition: Object creates an empty list.
"""
def __init__(self):
self.__inbox = []
#add_new_arrival method
"""
Pre-condition: Class method is handed a valid phone number of 11, 10, or 7
digits as a string with no hyphens or letters, a string containing a time,
and a string containing the text of a message.
Post-condition: Method will append a tuple containing False for an
undread message, the phone number, the time arrived and the text of the
message to the class created list.
"""
def add_new_arrival(self, from_number, time_arrived, text_of_SMS):
number = from_number
#Check for valid phone number and add hyphens based on number length
if len(number) == 11:
number = number[0] + "-" + number[1:4] + "-" + number[4:7] + "-"\
+ number[7:]
elif len(number) == 7:
number = number[:3] + "-" + number[3:]
elif len(number) == 10:
number = "1-" + number[:3] + "-" + number[3:6] + "-" + number[6:]
elif number.isalpha():
number = "Invalid number"
else:
number = "Invalid number"
time = time_arrived
text = text_of_SMS
message = (False, number, time, text)
self.__inbox.append(message)
#message_count method
"""
Post-condition: method returns the number of tuples in class created list.
Returns None if list is empty.
"""
def message_count(self):
count = len(self.__inbox)
if count == 0:
return None
else:
return count
#get_unread_indexes method
"""
Post-condition: method creates an empty list,checks for any tuples with
"False" at index 0. If "False" is found, it appends the index for the
tuple in the list. Method returns list of indexes.
"""
def get_unread_indexes(self):
unread = []
for message in self.__inbox:
if message[0] == False:
unread.append(self.__inbox.index(message))
return unread
#get_message method
"""
Pre-condition: Method is passed an integer.
Post-condition: Method checks for a valid index number. If valid, the
method will then check if indexed tuple contains "True" or "False" at index
0. If True, message is returned in new tuple containing items from indexes
1, 2, and 3. If False, a new tuple is created containing "True"
indicating the message is now read, plus indexes 1, 2, and 3 from the
original called tuple.
"""
def get_message(self, i):
#check for valid index number
if i >= len(self.__inbox):
return None
else:
msg = self.__inbox[i]
if msg[0] == True:
return (msg[1], msg[2], msg[3])
#create new tuple with True, and index 1-3 from original tuple
else:
self.__inbox.pop(i)
newMsg = (True, msg[1], msg[2], msg[3])
self.__inbox.insert(i, newMsg)
return newMsg[1:]
#delete method
"""
Pre-condition: Method is passed an integer.
Post-condition: Method checks that the integer is a valid index number. If
valid, method pops index from class created list.
"""
def delete(self, i):
if i >= len(self.__inbox):
return None
else:
self.__inbox.pop(i)
#Clear method
"""
Post-condition: method resets the inbox to an empty list.
"""
def clear(self):
self.__inbox = []
Here's how I am using the code in my test program:
#Test instantiation
naomisInbox = SMS_store()
martisInbox = SMS_store()
#Test add_new_arrival
naomisInbox.add_new_arrival("12345678912", "10:38PM", "Yay! Sorry, been")
martisInbox.add_new_arrival("23456789123", "10:37PM", "Hey I finally hit 90")
martisInbox.add_new_arrival("12345678912", "10:40PM", "Now I sleep :)")
naomisInbox.add_new_arrival("23456789123", "10:40PM", "Night")
#Test message_count
count = naomisInbox.message_count()
print("Naomi has", count, "messages in her inbox.")
count = martisInbox.message_count()
print("Marti has", count, "messages in his inbox.\n")
#Test get_unread_indexes
numUnread = naomisInbox.get_unread_indexes()
print("Naomi has unread messages at indexes: ", numUnread)
numUnread = martisInbox.get_unread_indexes()
print("Marti has unread messages at indexes: ", numUnread,"\n")
#Test get_message
msg = naomisInbox.get_message(9)
print("Getting message from Naomi's inbox at index [9]: ")
if msg == None:
print("No message at that index.")
else:
for item in msg:
print(item)
print("\n")
numUnread = naomisInbox.get_unread_indexes()
print("Naomi now has unread messages at indexes: ", numUnread, "\n")
msg = martisInbox.get_message(1)
print("Getting message from Marti's inbox at index [1]:")
for item in msg:
print(item)
print("\n")
numUnread = martisInbox.get_unread_indexes()
print("Marti now has unread messages at indexes: ", numUnread, "\n")
#Test delete
remove = naomisInbox.delete(0)
if remove == None:
print("Invalid index.")
count = naomisInbox.message_count()
numUnread = naomisInbox.get_unread_indexes()
print("Naomi now has", count, "messages with unread messages at index: ",\
numUnread)
#Test clear
print("\nAfter clearing: ")
naomisInbox.clear()
count = naomisInbox.message_count()
print("Naomi now has", count, "messages in her inbox.")
martisInbox.clear()
count = martisInbox.message_count()
print("Marti now has", count, "messages in his inbox.")
Error
Error:
Traceback (most recent call last):
File "/home/theriddler/Documents/CSIS153/Assignments/Nansen3/Nansen3.py", line 56, in <module>
remove = naomisInbox.delete(0)
File "/home/theriddler/Documents/CSIS153/Assignments/Nansen3/modSMS.py", line 125, in delete
NameError: global name 'msg' is not defined
Any help is appreciated. Sorry if it's a repeated question. Thanks, Blackwell.
for your first problem.
1)if there are only two items in the list then you cannot delete the 2nd item by passing 2 as index it should be 1.
2)your second problem tells that you are using same msg variable in SMS_store class within different functions without defining it as self variable for the class. However cant find any thing for now. You should probably check it again as it works well on my machine.
Now a little more light on your delete method:
def delete(self, i):
if i >= len(self.__inbox):
return None
else:
self.__inbox.pop(i)
Here if you want to delete the last message always then just use self.__ibox.pop() without passing any index but in case you want to delete an indexed message then u should do self.__ibox.pop(i-1)
because in case i is last element of the list then it will always be equal to length of the list and else will never be executed.
Also your delete method returns None only in if condition but if else runs then again None is returned by default so
remove = naomisInbox.delete(0)
if remove == None:
print("Invalid index.")
This will always print 'invalid index' as message even if the message gets deleted.