Counter class extension - auto-increment

I am having a problem finding an elegant way to create a Counter() class that can:
Feed in arbitrary number of keys and return a nested dictionary based on this list of keys.
Increment for this nested dictionary is arbitrary as well.
For example:
counter = Counter()
for line in fin:
if a:
counter.incr(key1, 1)
else:
counter.incr(key2, key3, 2)
print counter
Ideally I am hoping to get the result looks like: {key1 : 20, {key2 : {key3 : 40}}}. But I am stuck in creating this arbitrary nested dictionary from list of keys. Any help is appreciated.

you can subclass dict and create your own nested structure.
here's my attempt at writing such class :
class Counter(dict):
def incr(self, *args):
if len(args) < 2:
raise TypeError, "incr() takes at least 2 arguments (%d given)" %len(args)
curr = self
keys, count = args[:-1], args[-1]
for depth, key in enumerate(keys, 1):
if depth == len(keys):
curr[key] = curr.setdefault(key, 0) + count
else:
curr = curr.setdefault(key, {})
counter = Counter()
counter.incr('key1', 1)
counter.incr('key2', 'key3', 2)
counter.incr('key1', 7)
print counter #{'key2': {'key3': 2}, 'key1': 8}

There are two possibilities.
First, you can always fake the nested-keys thing by using a flat Counter with a "key path" made of tuples:
counter = Counter()
for line in fin:
if a:
counter.incr((key1,), 1)
else:
counter.incr((key2, key3), 2)
But then you'll need to write a str-replacement—or, better, a wrapper class that implements __str__. And while you're at it, you can easily write an incr wrapper that lets you use exactly the API you wanted:
def incr(self, *args):
super().incr(args[:-1], args[-1])
Alternatively, you can build your own Counter-like class on top of a nested dict. The code for Counter is written in pure Python, and the source is pretty simple and readable.
From, your code, it looks like you don't have any need to access things like counter[key2][key3] anywhere, which means the first is probably going to be simpler and more appropriate.

The only type of value that can exist in a Counter object is an int, you will not be able to represent a nested dictionary with a Counter.
Here is one way to do this with a normal dictionary (counter = {}). First, to update increment the value for a single key:
counter[key1] = counter.setdefault(key1, 0) + 1
Or for an arbitrary list of keys to create the nested structure:
tmp = counter
for key in key_list[:-1]:
tmp = tmp.setdefault(key, {})
tmp[key_list[-1]] = tmp.setdefault(key_list[-1], 0) + 1
I would probably turn this into the following function:
def incr(counter, val, *keys):
tmp = counter
for key in keys[:-1]:
tmp = tmp.setdefault(key, {})
tmp[keys[-1]] = tmp.setdefault(keys[-1], 0) + val
Example:
>>> counter = {}
>>> incr(counter, 1, 'a')
>>> counter
{'a': 1}
>>> incr(counter, 2, 'a')
>>> counter
{'a': 3}
>>> incr(counter, 2, 'b', 'c', 'd')
>>> counter
{'a': 3, 'b': {'c': {'d': 2}}}
>>> incr(counter, 3, 'b', 'c', 'd')
>>> counter
{'a': 3, 'b': {'c': {'d': 5}}}

Related

Python: Convert 2d list to dictionary with indexes as values

I have a 2d list with arbitrary strings like this:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
I want to create a dictionary out of this:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
How do I do this? This answer answers for 1D list for non-repeated values, but, I have a 2d list and values can repeat. Is there a generic way of doing this?
Maybe you could use two for-loops:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
d = {}
overall_idx = 0
for sub_lst in lst:
for word in sub_lst:
if word not in d:
d[word] = overall_idx
# Increment overall_idx below if you want to only increment if word is not previously seen
# overall_idx += 1
overall_idx += 1
print(d)
Output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
You could first convert the list of lists to a list using a 'double' list comprehension.
Next, get rid of all the duplicates using a dictionary comprehension, we could use set for that but would lose the order.
Finally use another dictionary comprehension to get the desired result.
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
# flatten list of lists to a list
flat_list = [item for sublist in lst for item in sublist]
# remove duplicates
ordered_set = {x:0 for x in flat_list}.keys()
# create required output
the_dictionary = {v:i for i, v in enumerate(ordered_set)}
print(the_dictionary)
""" OUTPUT
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
"""
also, with collections and itertools:
import itertools
from collections import OrderedDict
lstdict={}
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
lstkeys = list(OrderedDict(zip(itertools.chain(*lst), itertools.repeat(None))))
lstdict = {lstkeys[i]: i for i in range(0, len(lstkeys))}
lstdict
output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}

Erroneous behaviour while updating nested dictionary python3

While working on defaultdict class of collection package in python3.7, I see that new key is generated from the duplicate of last key, instead of initiating dictionary. Is there a way to initiate new element with given dictionary which is init_dict in below example code.
Example code to reproduce error:
from collections import defaultdict
init_dict = {'buy_qty': 0,
'sell_qty': 0}
pnl = defaultdict(lambda: init_dict)
pnl['a']['buy_qty'] += 1
pnl['a']['sell_qty'] += 1
Now when I do
pnl['b']
gives me
{'buy_qty': 1, 'sell_qty': 1}
I am looking for pnl['b'] to be initialized with init_dict. How can I achieve that?
Your copying by reference, not by value. So whatever you do to one dictionary, the other will be affected.
You can check this with the id() function:
print(id(pnl['a']))
print(id(pnl['b']))
print(id(pnl['a']) == id(pnl['b']))
Which will give the same memory addresses:
1817103232768
1817103232768
True
verifying that they are the same objects. You can fix this by assigning a shallow copy of the dictionary using dict.copy(), as mentioned in the comments:
pnl = defaultdict(lambda: init_dict.copy())
Or casting dict():
pnl = defaultdict(lambda: dict(init_dict))
Or using ** from PEP 448 -- Additional Unpacking Generalizations
:
pnl = defaultdict(lambda: {**init_dict})
Additionally, consider using a collections.Counter to do the counting, instead of initializing zero count dictionaries yourself:
from collections import defaultdict, Counter
pnl = defaultdict(Counter)
pnl['a']['buy_qty'] += 1
pnl['a']['sell_qty'] += 1
print(pnl)
# defaultdict(<class 'collections.Counter'>, {'a': Counter({'buy_qty': 1, 'sell_qty': 1})})
print(pnl['b']['buy_qty'])
# 0
print(pnl['b']['buy_qty'])
# 0
pnl['b']['buy_qty'] += 1
pnl['b']['sell_qty'] += 1
print(pnl)
# defaultdict(<class 'collections.Counter'>, {'a': Counter({'buy_qty': 1, 'sell_qty': 1}), 'b': Counter({'buy_qty': 1, 'sell_qty': 1})})
Counter is a subclass of dict, so they will work the same as normal dictionaries.

Fastest way to find all the indexes of maximum value in a list - Python

I am having list which as follows
input_list= [2, 3, 5, 2, 5, 1, 5]
I want to get all the indexes of maximum value. Need efficient solution. The output will be as follows.
output = [2,4,6] (The above list 5 is maximum value in a list)
I have tried by using below code
m = max(input_list)
output = [i for i, j in enumerate(a) if j == m]
I need to find any other optimum solution.
from collections import defaultdict
dic=defaultdict(list)
input_list=[]
for i in range(len(input_list)):
dic[input_list[i]]+=[i]
max_value = max(input_list)
Sol = dic[max_value]
You can use numpy (numpy arrays are very fast):
import numpy as np
input_list= np.array([2, 3, 5, 2, 5, 1, 5])
i, = np.where(input_list == np.max(input_list))
print(i)
Output:
[2 4 6]
Here's the approach which is described in comments. Even if you use some library, fundamentally you need to traverse at least once to solve this problem (considering input list is unsorted). So even lower bound for the algorithm would be Omega(size_of_list). If list is sorted we can leverage binary_search to solve the problem.
def max_indexes(l):
try:
assert l != []
max_element = l[0]
indexes = [0]
for index, element in enumerate(l[1:]):
if element > max_element:
max_element = element
indexes = [index + 1]
elif element == max_element:
indexes.append(index + 1)
return indexes
except AssertionError:
print ('input_list in empty')
Use a for loop for O(n) and iterating just once over the list resolution:
from itertools import islice
input_list= [2, 3, 5, 2, 5, 1, 5]
def max_indexes(l):
max_item = input_list[0]
indexes = [0]
for i, item in enumerate(islice(l, 1, None), 1):
if item < max_item:
continue
elif item > max_item:
max_item = item
indexes = [i]
elif item == max_item:
indexes.append(i)
return indexes
Here you have the live example
Think of it in this way, unless you iterate through the whole list once, which is O(n), n being the length of the list, you won't be able to compare the maximum with all values in the list, so the best you can do is O(n), which you already seems to be doing in your example.
So I am not sure you can do it faster than O(n) with the list approach.

return dictionary of file names as keys and word lists with words unique to file as values

I am trying to write a function to extract only words unique to each key and list them in a dictionary output like {"key1": "unique words", "key2": "unique words", ... }. I start out with a dictionary. To test with I created a simple dictionary:
d = {1:["one", "two", "three"], 2:["two", "four",
"five"], 3:["one","four", "six"]}
My output should be:
{1:"three",
2:"five",
3:"six"}
I am thinking maybe split in to separate lists
def return_unique(dct):
Klist = list(dct.keys())
Vlist = list(dct.values())
aList = []
for i in range(len(Vlist)):
for j in Vlist[i]:
if
What I'm stuck on is how do I tell Python to do this: if Vlist[i][j] is not in the rest of Vlist then aList.append(Vlist[i][j]).
Thank you.
You can try something like this:
def return_unique(data):
all_values = []
for i in data.values(): # Get all values
all_values = all_values + i
unique_values = set([x for x in all_values if all_values.count(x) == 1]) # Values which are not duplicated
for key, value in data.items(): # For Python 3.x ( For Python 2.x -> data.iteritems())
for item in value: # Comparing values of two lists
for item1 in unique_values:
if item == item1:
data[key] = item
return data
d = {1:["one", "two", "three"], 2:["two", "four", "five"], 3:["one","four", "six"]}
print (return_unique(d))
result >> {1: 'three', 2: 'five', 3: 'six'}
Since a key may have more than one unique word associated with it, it makes sense for the values in the new dictionary to be a container type object to hold the unique words.
The set difference operator returns the difference between 2 sets:
>>> a = set([1, 2, 3])
>>> b = set([2, 4, 6])
>>> a - b
{1, 3}
We can use this to get the values unique to each key. Packaging these into a simple function yields:
def unique_words_dict(data):
res = {}
values = []
for k in data:
for g in data:
if g != k:
values += data[g]
res[k] = set(data[k]) - set(values)
values = []
return res
>>> d = {1:["one", "two", "three"],
2:["two", "four", "five"],
3:["one","four", "six"]}
>>> unique_words_dict(d)
{1: {'three'}, 2: {'five'}, 3: {'six'}}
If you only had to do this once, then you might be interested in the less efficeint but more consice dictionary comprehension:
>>> from functools import reduce
>>> {k: set(d[k]) - set(reduce(lambda a, b: a+b, [d[g] for g in d if g!=k], [])) for k in d}
{1: {'three'}, 2: {'five'}, 3: {'six'}}

Using python need to get the substrings

Q)After executing the code Need to print the values [1, 12, 123, 2, 23, 3, 13], but iam getting [1, 12, 123, 2, 23, 3]. I have missing the letter 13. can any one tell me the reason to overcome that error?
def get_all_substrings(string):
length = len(string)
list = []
for i in range(length):
for j in range(i,length):
list.append(string[i:j+1])
return list
values = get_all_substrings('123')
results = list(map(int, values))
print(results)
count = 0
for i in results:
if i > 1 :
if (i % 2) != 0:
count += 1
print(count)
Pretty straight forward issue in your nested for loops within get_all_substrings(), lets walk it!
You are iterating over each element of your string 123:
for i in range(length) # we know length to be 3, so range is 0, 1, 2
You then iterate each subsequent element from the current i:
for j in range(i,length)
Finally you append a string from position i to j+1 using the slice operator:
list.append(string[i:j+1])
But what exactly is happening? Well we can step through further!
The first value of i is 0, so lets skip the first for, go to the second:
for j in range(0, 3): # i.e. the whole string!
# you would eventually execute all of the following
list.append(string[0:0 + 1]) # '1'
list.append(string[0:1 + 1]) # '12'
list.append(string[0:2 + 1]) # '123'
# but wait...were is '13'???? (this is your hint!)
The next value of i is 1:
for j in range(1, 3):
# you would eventually execute all of the following
list.append(string[1:1 + 1]) # '2'
list.append(string[1:2 + 1]) # '23'
# notice how we are only grabbing values of position i or more?
Finally you get to i is 2:
for j in range(2, 3): # i.e. the whole string!
# you would eventually execute all of the following
list.append(string[2:2 + 1]) # '3'
I've shown you what is happening (as you've asked in your question), I leave it to you to devise your own solution. A couple notes:
You need to look at all index combinations from position i
Dont name objects by their type (i.e. dont name a list object list)
I would try something like this using itertools and powerset() recipe
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
output = list(map(''.join, powerset('123')))
output.pop(0)
Here is another option, using combinations
from itertools import combinations
def get_sub_ints(raw):
return [''.join(sub) for i in range(1, len(raw) + 1) for sub in combinations(raw, i)]
if __name__ == '__main__':
print(get_sub_ints('123'))
>>> ['1', '2', '3', '12', '13', '23', '123']

Resources