What's the underlying implementation for most_common method of Counter? - python-3.x

I found a pyi file which has the following def
def most_common(self, n: Optional[int] = ...) -> List[Tuple[_T, int]]: ...
How could this happen? List is not defined, and no implementation?
Just highlight some valuable suggestions here for followers:
List is imported from the typing module; it's not the same thing as list. The .pyi file doesn't need to import it because stub files are never executed; they just have to be syntactically valid Python
If you use from future import annotations, you won't have to import typing to use List et al. in function annotations in .py files, either, since function annotations will be treated as string literals. (Starting in Python 4, that will be the default behavior. See PEP 563 for details.)

You are looking at the pyi file which is used solely for annotations. It is never executed by the Python interpreter. You can learn more about pyi files by reading PEP484.
Using a debugger, put a breakpoint on the line where you call most_commonand then step into the method.
Python 3.7 implementation.
...\Lib\collections\__init__.py:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
_heapq.nlargest (in ...\Lib\heapq.py) implementation:
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max()
if n == 1:
it = iter(iterable)
sentinel = object()
if key is None:
result = max(it, default=sentinel)
else:
result = max(it, default=sentinel, key=key)
return [] if result is sentinel else [result]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = iter(iterable)
result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
if top < elem:
_heapreplace(result, (elem, order))
top, _order = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (elem, order) in result]
# General case, slowest method
it = iter(iterable)
result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
k = key(elem)
if top < k:
_heapreplace(result, (k, order, elem))
top, _order, _elem = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (k, order, elem) in result]

Related

How get corresponding element in sub list based on input element from the same list python

I've list like this.
list=[[['name1','name2'],[n1,n2]], [['name3','name4'],[n3,n4]]]
I want to get n1 if input is name1
similarly if input if name3 then output should be n3
Note: name1-Type str
n1- Type int
Is there is any way to do this?..Pls suggest me solution/Solution steps that i can follow to solve this issue..
I see building an intermediate lookup dict from my_list, then looking up as you like:
my_list=[
[['name1','name2'],['n1','n2']],
[['name3','name4'],['n3','n4']]
]
lookup = {}
for double_tuple in my_list:
lhs, rhs = double_tuple
zipped = zip(lhs, rhs) # ['name1','name2'],['n1','n2'] → ['name1', 'n1'],['name2','n2']
lookup.update(dict(zipped))
print(lookup['name1']) # → 'n1'
It can be easily solved with a list comprehension:
unpack the elements in the list
filter for k1 == input
get first result, if exists
input_ = "name1"
list_ = [[['name1','name2'],[n1,n2]], [['name3','name4'],[n3,n4]]]
candidates = [v1
for (k1, _), (v1, _) in list_
if k1 == input_]
if len(candidates) == 0:
print("No such key: " + input_)
else:
print("Value is " + candidates[0])
Note: I used trailing underscores in the names to avoid overwriting builtin functions (list and input). Overwriting builtin functions is bad practice.
You can use filter combined with next:
def get_item_from_key(input_list, key):
"""Return item corresponding to a specific key"""
try:
return next(filter(lambda x: x[0][0] == key, input_list))[1][0]
except StopIteration:
return None
So, if the input list is a = [[['name1', 'name2'], [0, 1]], [['name3', 'name4'], [2, 3]]], you can ask for any key you are interested into:
get_item_from_key(a, 'name1') # this will return 0
get_item_from_key(a, 'name3') # this will return 2
get_item_from_key(a, 'name2') # this will return None
get_item_from_key(a, 'name5') # this will return None

Need to fetch 1st value from the dictionary from all the preferred keys in python [duplicate]

What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

How search an unordered list for a key using reduce?

I have a basic reduce function and I want to reduce a list in order to check if an item is in the list. I have defined the function below where f is a comparison function, id_ is the item I am searching for, and a is the list. For example, reduce(f, 2, [1, 6, 2, 7]) would return True since 2 is in the list.
def reduce(f, id_, a):
if len(a) == 0:
return id_
elif len(a) == 1:
return a[0]
else:
# can call these in parallel
res = f(reduce(f, id_, a[:len(a)//2]),
reduce(f, id_, a[len(a)//2:]))
return res
I tried passing it a comparison function:
def isequal(x, element):
if x == True: # if element has already been found in list -> True
return True
if x == element: # if key is equal to element -> True
return True
else: # o.w. -> False
return False
I realize this does not work because x is not the key I am searching for. I get how reduce works with summing and products, but I am failing to see how this function would even know what the key is to check if the next element matches.
I apologize, I am a bit new to this. Thanks in advance for any insight, I greatly appreciate it!
Based on your example, the problem you seem to be trying to solve is determining whether a value is or is not in a list. In that case reduce is probably not the best way to go about that. To check if a particular value is in a list or not, Python has a much simpler way of doing that:
my_list = [1, 6, 2, 7]
print(2 in my_list)
print(55 in my_list)
True
False
Edit: Given OP's comment that they were required to use reduce to solve the problem, the code below will work, but I'm not proud of it. ;^) To see how reduce is intended to be used, here is a good source of information.
Example:
from functools import reduce
def test_match(match_params, candidate):
pattern, found_match = match_params
if not found_match and pattern == candidate:
match_params = (pattern, True)
return match_params
num_list = [1,2,3,4,5]
_, found_match = reduce(test_match, num_list, (2, False))
print(found_match)
_, found_match = reduce(test_match, num_list, (55, False))
print(found_match)
Output:
True
False

Most "pythonic" way of populating a nested indexed list from a flat list

I have a situation where I am generating a number of template nested lists with n organised elements where each number in the template corresponds to the index from a flat list of n values:
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
For each of these templates, the values inside them correspond to the index value from a flat list like so:
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
to get;
B = [[["c","e"],["a","d"]], [["b","f"],["g","h"]],[["k","j"],["i","l"],["n","m"]]]
How can I populate the structure S with the values from list A to get B, considering that:
- the values of list A can change in value but not in a number
- the template can have any depth of nested structure of but will only use an index from A once as the example shown above.
I did this with the very ugly append unflatten function below that works if the depth of the template is not more then 3 levels. Is there a better way of accomplishing it using generators, yield so it works for any arbitrary depth of template.
Another solution I thought but couldn't implement is to set the template as a string with generated variables and then assigning the variables with new values using eval()
def unflatten(item, template):
# works up to 3 levels of nested lists
tree = []
for el in template:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
tree[-1][-1].append([])
else:
tree[-1][-1].append(item[el3])
else:
tree[-1].append(item[el2])
else:
tree.append(item[el])
return tree
I need a better solution that can be employed to accomplish this when doing the above recursively and for n = 100's of organised elements.
UPDATE 1
The timing function I am using is this one:
def timethis(func):
'''
Decorator that reports the execution time.
'''
#wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, end-start)
return result
return wrapper
and I am wrapping the function suggested by #DocDrivin inside another to call it with a one-liner. Below it is my ugly append function.
#timethis
def unflatten(A, S):
for i in range(100000):
# making sure that you don't modify S
rebuilt_list = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
#build list
worker(rebuilt_list)
return rebuilt_list
#timethis
def unflatten2(A, S):
for i in range (100000):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
The recursive method is much better syntax, however, it is considerably slower then using the append method.
You can do this by using recursion:
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
# making sure that you don't modify S
B = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
worker(B)
print(B)
This yields the following output for B:
[[['c', 'e'], ['a', 'd']], [['b', 'f'], ['g', 'h']], [['k', 'j'], ['i', 'l'], ['n', 'm']]]
I did not check if the list entry can actually be mapped with the dict, so you might want to add a check (marked the spot in the code).
Small edit: just saw that your desired output (probably) has a typo. Index 3 maps to "d", not to "c". You might want to edit that.
Big edit: To prove that my proposal is not as catastrophic as it seems at a first glance, I decided to include some code to test its runtime. Check this out:
import timeit
setup1 = '''
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(olist):
alist = copy.deepcopy(olist)
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
alist[idx] = adict[entry]
return alist
'''
code1 = '''
worker(S)
'''
setup2 = '''
import collections
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
def unflatten2(A, S):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
'''
code2 = '''
unflatten2(A, S)
'''
print(f'Recursive func: { [i/10000 for i in timeit.repeat(setup = setup1, stmt = code1, repeat = 3, number = 10000)] }')
print(f'Original func: { [i/10000 for i in timeit.repeat(setup = setup2, stmt = code2, repeat = 3, number = 10000)] }')
I am using the timeit module to do my tests. When running this snippet, you will get an output similar to this:
Recursive func: [8.74395573977381e-05, 7.868373290111777e-05, 7.9051584698027e-05]
Original func: [3.548609419958666e-05, 3.537480780214537e-05, 3.501355930056888e-05]
These are the average times of 10000 iterations, and I decided to run it 3 times to show the fluctuation. As you can see, my function in this particular case is 2.22 to 2.50 times slower than the original, but still acceptable. The slowdown is probably due to using deepcopy.
Your test has some flaws, e.g. you redefine the mapping dict at every iteration. You wouldn't do that normally, instead you would give it as a param to the function after defining it once.
You can use generators with recursion
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
S = [[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = {k: v for k, v in enumerate(A)}
def worker(alist):
for e in alist:
if isinstance(e, list):
yield list(worker(e))
else:
yield A[e]
def do(alist):
return list(worker(alist))
This is also a recursive approach, just avoiding individual item assignment and letting list do the work by reading the values "hot off the CPU" from your generator. If you want, you can Try it online!-- setup1 and setup2 copied from #DocDriven 's answer (but I recommend you don't exaggerate with the numbers, do it locally if you want to play around).
Here are example time numbers:
My result: [0.11194685893133283, 0.11086182110011578, 0.11299032904207706]
result1: [1.0810202199500054, 1.046933784848079, 0.9381260159425437]
result2: [0.23467918601818383, 0.236218704842031, 0.22498539905063808]

Python 3.x - function args type-testing

I started learning Python 3.x some time ago and I wrote a very simple code which adds numbers or concatenates lists, tuples and dicts:
X = 'sth'
def adder(*vargs):
if (len(vargs) == 0):
print('No args given. Stopping...')
else:
L = list(enumerate(vargs))
for i in range(len(L) - 1):
if (type(L[i][1]) != type(L[i + 1][1])):
global X
X = 'bad'
break
if (X == 'bad'):
print('Args have different types. Stopping...')
else:
if type(L[0][1]) == int: #num
temp = 0
for i in range(len(L)):
temp += L[i][1]
print('Sum is equal to:', temp)
elif type(L[0][1]) == list: #list
A = []
for i in range(len(L)):
A += L[i][1]
print('List made is:', A)
elif type(L[0][1]) == tuple: #tuple
A = []
for i in range(len(L)):
A += list(L[i][1])
print('Tuple made is:', tuple(A))
elif type(L[0][1]) == dict: #dict
A = L[0][1]
for i in range(len(L)):
A.update(L[i][1])
print('Dict made is:', A)
adder(0, 1, 2, 3, 4, 5, 6, 7)
adder([1,2,3,4], [2,3], [5,3,2,1])
adder((1,2,3), (2,3,4), (2,))
adder(dict(a = 2, b = 433), dict(c = 22, d = 2737))
My main issue with this is the way I am getting out of the function when args have different types with the 'X' global. I thought a while about it, but I can't see easier way of doing this (I can't simply put the else under for, because the results will be printed a few times; probably I'm messing something up with the continue and break usage).
I'm sure I'm missing an easy way to do this, but I can't get it.
Thank you for any replies. If you have any advice about any other code piece here, I would be very grateful for additional help. I probably have a lot of bad non-Pythonian habits coming from earlier C++ coding.
Here are some changes I made that I think clean it up a bit and get rid of the need for the global variable.
def adder(*vargs):
if len(vargs) == 0:
return None # could raise ValueError
mytype = type(vargs[0])
if not all(type(x) == mytype for x in vargs):
raise ValueError('Args have different types.')
if mytype is int:
print('Sum is equal to:', sum(vargs))
elif mytype is list or mytype is tuple:
out = []
for item in vargs:
out += item
if mytype is list:
print('List made is:', out)
else:
print('Tuple made is:', tuple(out))
elif mytype is dict:
out = {}
for i in vargs:
out.update(i)
print('Dict made is:', out)
adder(0, 1, 2, 3, 4, 5, 6, 7)
adder([1,2,3,4], [2,3], [5,3,2,1])
adder((1,2,3), (2,3,4), (2,))
adder(dict(a = 2, b = 433), dict(c = 22, d = 2737))
I also made some other improvements that I think are a bit more 'pythonic'. For instance
for item in list:
print(item)
instead of
for i in range(len(list)):
print(list[i])
In a function like this if there are illegal arguments you would commonly short-cuircuit and just throw a ValueError.
if bad_condition:
raise ValueError('Args have different types.')
Just for contrast, here is another version that feels more pythonic to me (reasonable people might disagree with me, which is OK by me).
The principal differences are that a) type clashes are left to the operator combining the arguments, b) no assumptions are made about the types of the arguments, and c) the result is returned instead of printed. This allows combining different types in the cases where that makes sense (e.g, combine({}, zip('abcde', range(5)))).
The only assumption is that the operator used to combine the arguments is either add or a member function of the first argument's type named update.
I prefer this solution because it does minimal type checking, and uses duck-typing to allow valid but unexpected use cases.
from functools import reduce
from operator import add
def combine(*args):
if not args:
return None
out = type(args[0])()
return reduce((getattr(out, 'update', None) and (lambda d, u: [d.update(u), d][1]))
or add, args, out)
print(combine(0, 1, 2, 3, 4, 5, 6, 7))
print(combine([1,2,3,4], [2,3], [5,3,2,1]))
print(combine((1,2,3), (2,3,4), (2,)))
print(combine(dict(a = 2, b = 433), dict(c = 22, d = 2737)))
print(combine({}, zip('abcde', range(5))))

Resources