Dict comprehension from dict to inverse dict - python-3.x

I have the following data:
a = {1: {'data': 243}, 2: {'data': 253}, 4: {'data':243}}
And I want to turn it around, so that the key is the values, and the data values is the keys. So first try:
b = dict(map(lambda id: (a[id]['data'], id, a))
But when I do this, the 1 gets overwritten by the 4, so result will be:
{243: 4, 253: 2}
So what I would like to get is a structure like this:
{243: [1, 4], 253: [2]}
How do I do this?

I felt the below code is more readable and simpler way of approaching your problem.
from collections import defaultdict
a = {1: {'data': 243}, 2: {'data': 253}, 4: {'data':243}}
result = defaultdict(list)
for k, v in a.items():
result[v['data']].append(k)
print(result)
Output:
defaultdict(<class 'list'>, {243: [1, 4], 253: [2]})

This can be done with a dict comprehension and itertools.groupby(), but since dicts are not ordered, we must work with a sorted list, because groupby expects pre-sorted input.
from itertools import groupby
a = {1: {'data': 243}, 2: {'data': 253}, 4: {'data': 243}}
# key extractor function suitable for both sorted() and groupby()
keyfunc = lambda i: i[1]['data']
{g[0]: [i[0] for i in g[1]] for g in groupby(sorted(a.items(), key=keyfunc), key=keyfunc)}
here g is a grouping tuple (key, items), where
g[0] is whatever keyfunc extracts (in this case the 'data' value), and
g[1] is an iterable over dict items, i.e. (key, value) tuples, hence the additional list comprehension to extract the keys only.
result:
{243: [1, 4], 253: [2]}

Related

Dictionary comprehension - dynamically generate key-value pairs?

I want to achieve the same result of
{i: i+1 for i in range(4)} # {0: 1, 1: 2, 2: 3, 3: 4}
But dynamically generate key: value part with myfunc(i), how do I do that?
Functions that return {i: i+1} or (i, i+1) won't work:
{{i: i+1} for i in range(4)} # TypeError: unhashable type: 'dict'
dict(map(myfunc, range(4)))
See: https://docs.python.org/3.9/library/stdtypes.html#dict
Example:
>>> dict([(1,2), (3,4)])
{1: 2, 3: 4}
>>> dict(map(lambda x: (x, x+1), range(4)))
{0: 1, 1: 2, 2: 3, 3: 4}
The following comprehension script doesn't work because you are using a dictionary as a key.
{{i: i+1} for i in range(4)} # TypeError: unhashable type: 'dict'
this is like this:
a = {1:2}
b = {a:3} # TypeError: unhashable type: 'dict'
The point is that all dictionary keys should be from immutable data types like Strings, Numbers, Tuples or Frozen Sets. In other word, you can not use mutable data types like Dictionaries, List or Sets as Dictionary key or a even Set value.

Python Groupby keys list of dictionaries

I have the following for instance:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
and I would like to get a dictionary like this:
x = [{'A': [1,2], 'B': [1,2,3], 'C':[1], 'D': [1]}]
Do you have any idea how I could get this please?
You could use a collections.defaultdict of sets to collect unique values, then convert the final result to a dictionary with values as lists using a dict comprehension:
from collections import defaultdict
lst = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
result = defaultdict(set)
for dic in lst:
for key, value in dic.items():
result[key].add(value)
print({key: list(value) for key, value in result.items()})
Output:
{'A': [1, 2], 'B': [1, 2, 3], 'C': [1], 'D': [1]}
Although its probably better to add your data directly to the defaultdict to begin with, instead of creating a list of singleton dictionaries(don't recommend this data structure) then converting the result.
Using dict.setdefault
Ex:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
res = {}
for i in x:
for k, v in i.items():
res.setdefault(k, set()).add(v)
#or res = [{k: list(v) for k, v in res.items()}]
print(res)
Output:
{'A': {1, 2}, 'B': {1, 2, 3}, 'C': {1}, 'D': {1}}

How to get a list of objects by maximum attribute value from N multi-list of objects?

I have a multi-list of objects as follows (simplified version)
listA = [[obj1(val=1),obj2(val=1)],[obj2(val=4),obj3(val=2)]]
listB = [[obj4(val=1),obj5(val=1)],[obj6(val=5),obj7(val=3)]]
listC = [[obj8(val=1),obj9(val=1)],[obj10(val=6),obj11(val=4)]]
I want to get a list of objects from the above multi-list which has the maximum value of a certain attribute by comparing the sub-lists of each multi-list. If the value of the attribute is the same for all the compared objects, it should get any one object.
output:
maxList = [obj1(value=1),obj10(val=6)]
There is a similar question to get object with maximum value of attribute from a list, but this case is for multi-list. I know this can be acheived with nested for loops, but there a must be a better way to do this with itertools and getattr ?
To simplify, let's demonstrate on regular integers. Adapt this approach to your object.
Given
import itertools as it
a = [[1, 1], [3, 2]]
b = [[1, 1], [5, 3]]
c = [[1, 1], [6, 3]]
Code
list(map(max, [list(it.chain(*col)) for col in zip(a, b, c)]))
# [1, 6]
Equivalently
[max([x for x in it.chain(*col)]) for col in zip(a, b, c)]
# [1, 6]

populating a dictionary of lists behavior [duplicate]

My attempt to programmatically create a dictionary of lists is failing to allow me to individually address dictionary keys. Whenever I create the dictionary of lists and try to append to one key, all of them are updated. Here's a very simple test case:
data = {}
data = data.fromkeys(range(2),[])
data[1].append('hello')
print data
Actual result: {0: ['hello'], 1: ['hello']}
Expected result: {0: [], 1: ['hello']}
Here's what works
data = {0:[],1:[]}
data[1].append('hello')
print data
Actual and Expected Result: {0: [], 1: ['hello']}
Why is the fromkeys method not working as expected?
When [] is passed as the second argument to dict.fromkeys(), all values in the resulting dict will be the same list object.
In Python 2.7 or above, use a dict comprehension instead:
data = {k: [] for k in range(2)}
In earlier versions of Python, there is no dict comprehension, but a list comprehension can be passed to the dict constructor instead:
data = dict([(k, []) for k in range(2)])
In 2.4-2.6, it is also possible to pass a generator expression to dict, and the surrounding parentheses can be dropped:
data = dict((k, []) for k in range(2))
Try using a defaultdict instead:
from collections import defaultdict
data = defaultdict(list)
data[1].append('hello')
This way, the keys don't need to be initialized with empty lists ahead of time. The defaultdict() object instead calls the factory function given to it, every time a key is accessed that doesn't exist yet. So, in this example, attempting to access data[1] triggers data[1] = list() internally, giving that key a new empty list as its value.
The original code with .fromkeys shares one (mutable) list. Similarly,
alist = [1]
data = dict.fromkeys(range(2), alist)
alist.append(2)
print(data)
would output {0: [1, 2], 1: [1, 2]}. This is called out in the dict.fromkeys() documentation:
All of the values refer to just a single instance, so it generally doesn’t make sense for value to be a mutable object such as an empty list.
Another option is to use the dict.setdefault() method, which retrieves the value for a key after first checking it exists and setting a default if it doesn't. .append can then be called on the result:
data = {}
data.setdefault(1, []).append('hello')
Finally, to create a dictionary from a list of known keys and a given "template" list (where each value should start with the same elements, but be a distinct list), use a dictionary comprehension and copy the initial list:
alist = [1]
data = {key: alist[:] for key in range(2)}
Here, alist[:] creates a shallow copy of alist, and this is done separately for each value. See How do I clone a list so that it doesn't change unexpectedly after assignment? for more techniques for copying the list.
You could use a dict comprehension:
>>> keys = ['a','b','c']
>>> value = [0, 0]
>>> {key: list(value) for key in keys}
{'a': [0, 0], 'b': [0, 0], 'c': [0, 0]}
This answer is here to explain this behavior to anyone flummoxed by the results they get of trying to instantiate a dict with fromkeys() with a mutable default value in that dict.
Consider:
#Python 3.4.3 (default, Nov 17 2016, 01:08:31)
# start by validating that different variables pointing to an
# empty mutable are indeed different references.
>>> l1 = []
>>> l2 = []
>>> id(l1)
140150323815176
>>> id(l2)
140150324024968
so any change to l1 will not affect l2 and vice versa.
this would be true for any mutable so far, including a dict.
# create a new dict from an iterable of keys
>>> dict1 = dict.fromkeys(['a', 'b', 'c'], [])
>>> dict1
{'c': [], 'b': [], 'a': []}
this can be a handy function.
here we are assigning to each key a default value which also happens to be an empty list.
# the dict has its own id.
>>> id(dict1)
140150327601160
# but look at the ids of the values.
>>> id(dict1['a'])
140150323816328
>>> id(dict1['b'])
140150323816328
>>> id(dict1['c'])
140150323816328
Indeed they are all using the same ref!
A change to one is a change to all, since they are in fact the same object!
>>> dict1['a'].append('apples')
>>> dict1
{'c': ['apples'], 'b': ['apples'], 'a': ['apples']}
>>> id(dict1['a'])
>>> 140150323816328
>>> id(dict1['b'])
140150323816328
>>> id(dict1['c'])
140150323816328
for many, this was not what was intended!
Now let's try it with making an explicit copy of the list being used as a the default value.
>>> empty_list = []
>>> id(empty_list)
140150324169864
and now create a dict with a copy of empty_list.
>>> dict2 = dict.fromkeys(['a', 'b', 'c'], empty_list[:])
>>> id(dict2)
140150323831432
>>> id(dict2['a'])
140150327184328
>>> id(dict2['b'])
140150327184328
>>> id(dict2['c'])
140150327184328
>>> dict2['a'].append('apples')
>>> dict2
{'c': ['apples'], 'b': ['apples'], 'a': ['apples']}
Still no joy!
I hear someone shout, it's because I used an empty list!
>>> not_empty_list = [0]
>>> dict3 = dict.fromkeys(['a', 'b', 'c'], not_empty_list[:])
>>> dict3
{'c': [0], 'b': [0], 'a': [0]}
>>> dict3['a'].append('apples')
>>> dict3
{'c': [0, 'apples'], 'b': [0, 'apples'], 'a': [0, 'apples']}
The default behavior of fromkeys() is to assign None to the value.
>>> dict4 = dict.fromkeys(['a', 'b', 'c'])
>>> dict4
{'c': None, 'b': None, 'a': None}
>>> id(dict4['a'])
9901984
>>> id(dict4['b'])
9901984
>>> id(dict4['c'])
9901984
Indeed, all of the values are the same (and the only!) None.
Now, let's iterate, in one of a myriad number of ways, through the dict and change the value.
>>> for k, _ in dict4.items():
... dict4[k] = []
>>> dict4
{'c': [], 'b': [], 'a': []}
Hmm. Looks the same as before!
>>> id(dict4['a'])
140150318876488
>>> id(dict4['b'])
140150324122824
>>> id(dict4['c'])
140150294277576
>>> dict4['a'].append('apples')
>>> dict4
>>> {'c': [], 'b': [], 'a': ['apples']}
But they are indeed different []s, which was in this case the intended result.
You can use this:
l = ['a', 'b', 'c']
d = dict((k, [0, 0]) for k in l)
You are populating your dictionaries with references to a single list so when you update it, the update is reflected across all the references. Try a dictionary comprehension instead. See
Create a dictionary with list comprehension in Python
d = {k : v for k in blah blah blah}
You could use this:
data[:1] = ['hello']

flatten a nested list with indices in python

I have a list ['','','',['',[['a','b']['c']]],[[['a','b'],['c']]],[[['d']]]]
I want to flatten the list with indices and the output should be as follows:
flat list=['','','','','a','b','c','a','b','c','d']
indices=[0,1,2,3,3,3,3,4,4,4,5]
How to do this?
I have tried this:
def flat(nums):
res = []
index = []
for i in range(len(nums)):
if isinstance(nums[i], list):
res.extend(nums[i])
index.extend([i]*len(nums[i]))
else:
res.append(nums[i])
index.append(i)
return res,index
But this doesn't work as expected.
TL;DR
This implementation handles nested iterables with unbounded depth:
def enumerate_items_from(iterable):
cursor_stack = [iter(iterable)]
item_index = -1
while cursor_stack:
sub_iterable = cursor_stack[-1]
try:
item = next(sub_iterable)
except StopIteration:
cursor_stack.pop()
continue
if len(cursor_stack) == 1:
item_index += 1
if not isinstance(item, str):
try:
cursor_stack.append(iter(item))
continue
except TypeError:
pass
yield item, item_index
def flat(iterable):
return map(list, zip(*enumerate_items_from(a)))
Which can be used to produce the desired output:
>>> nested = ['', '', '', ['', [['a', 'b'], ['c']]], [[['a', 'b'], ['c']]], [[['d']]]]
>>> flat_list, item_indexes = flat(nested)
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5]
>>> print(flat_list)
['', '', '', '', 'a', 'b', 'c', 'a', 'b', 'c', 'd']
Note that you should probably put the index first to mimic what enumerate does. It would be easier to use for people that already know enumerate.
Important remark unless you are certain your lists will not be nested too much, you shouldn't use any recursion-based solution. Otherwise as soon as you'll have a nested list with depth greater than 1000, your code will crash. I explain this here. Note that a simple call to str(list) will crash on a test case with depth > 1000 (for some python implementations it's more than that, but it's always bounded). The typical exception you'll have when using recursion-based solutions is (this in short is due to how python call stack works):
RecursionError: maximum recursion depth exceeded ...
Implementation details
I'll go step by step, first we will flatten a list, then we will output both the flattened list and the depth of all items, and finally we will output both the list and the corresponding item indexes in the "main list".
Flattening list
That being said, this is actually quite interesting as the iterative solution is perfectly designed for that, you can take a simple (non-recursive) list flattening algorithm:
def flatten(iterable):
return list(items_from(iterable))
def items_from(iterable):
cursor_stack = [iter(iterable)]
while cursor_stack:
sub_iterable = cursor_stack[-1]
try:
item = next(sub_iterable)
except StopIteration: # post-order
cursor_stack.pop()
continue
if isinstance(item, list): # pre-order
cursor_stack.append(iter(item))
else:
yield item # in-order
Computing depth
We can have access to the depth by looking at the stack size, depth = len(cursor_stack) - 1
else:
yield item, len(cursor_stack) - 1 # in-order
This will return an iterative on pairs (item, depth), if we need to separate this result in two iterators we can use the zip function:
>>> a = [1, 2, 3, [4 , [[5, 6], [7]]], [[[8, 9], [10]]], [[[11]]]]
>>> flatten(a)
[(1, 0), (2, 0), (3, 0), (4, 1), (5, 3), (6, 3), (7, 3), (8, 3), (9, 3), (10, 3), (11, 3)]
>>> flat_list, depths = zip(*flatten(a))
>>> print(flat_list)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
>>> print(depths)
(0, 0, 0, 1, 3, 3, 3, 3, 3, 3, 3)
We will now do something similar to have item indexes instead of the depth.
Computing item indexes
To instead compute item indexes (in the main list), you'll need to count the number of items you've seen so far, which can be done by adding 1 to an item_index every time we iterate over an item that is at depth 0 (when the stack size is equal to 1):
def flatten(iterable):
return list(items_from(iterable))
def items_from(iterable):
cursor_stack = [iter(iterable)]
item_index = -1
while cursor_stack:
sub_iterable = cursor_stack[-1]
try:
item = next(sub_iterable)
except StopIteration: # post-order
cursor_stack.pop()
continue
if len(cursor_stack) == 1: # If current item is in "main" list
item_index += 1
if isinstance(item, list): # pre-order
cursor_stack.append(iter(item))
else:
yield item, item_index # in-order
Similarly we will break pairs in two itératifs using ˋzip, we will also use ˋmap to transform both iterators to lists:
>>> a = [1, 2, 3, [4 , [[5, 6], [7]]], [[[8, 9], [10]]], [[[11]]]]
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5]
improvement — Handling iterable inputs
Being able to take a broader palette of nested iterables as input could be desirable (especially if you build this for others to use). For example, the current implementation doesn't work as expected if we have nested iterables as input, for example:
>>> a = iter([1, '2', 3, iter([4, [[5, 6], [7]]])])
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, '2', 3, <list_iterator object at 0x100f6a390>]
>>> print(item_indexes)
[0, 1, 2, 3]
If we want this to work we need to be a bit careful because strings are iterable but we want them to be considered as atomic items (not a as lists of characters). Instead of assuming the input is a list as we did before:
if isinstance(item, list): # pre-order
cursor_stack.append(iter(item))
else:
yield item, item_index # in-order
We will not inspect the input type, instead we will try to use it as if it was an iterable and if it fails we will know that it’s not an iterable (duck typing):
if not isinstance(item, str):
try:
cursor_stack.append(iter(item))
continue
# item is not an iterable object:
except TypeError:
pass
yield item, item_index
With this implementation, we have:
>>> a = iter([1, 2, 3, iter([4, [[5, 6], [7]]])])
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, 2, 3, 4, 5, 6, 7]
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3]
Building test cases
If you need to generate tests cases with large depths, you can use this piece of code:
def build_deep_list(depth):
"""Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
with $depth > 1$ and $l_0 = [0]$.
"""
sub_list = [0]
for d in range(1, depth):
sub_list = [d, sub_list]
return sub_list
You can use this to make sure my implementation doesn't crash when the depth is large:
a = build_deep_list(1200)
flat_list, item_indexes = map(list, zip(*flatten(a)))
We can also check that we can't print such a list by using the str function:
>>> a = build_deep_list(1200)
>>> str(a)
RecursionError: maximum recursion depth exceeded while getting the repr of an object
Function repr is called by str(list) on every element from the input list.
Concluding remarks
In the end I agree that recursive implementations are way easier to read (as the call stack does half the hard work for us), but when implementing low level function like that I think it is a good investment to have a code that works in all cases (or at least all the cases you can think of). Especially when the solution is not that hard. That's also a way not to forget how to write non-recursive code working on tree-like structures (which may not happen a lot unless you are implementing data structures yourself, but that's a good exercise).
Note that everything I say “against” recursion is only true because python doesn't optimize call stack usage when facing recursion: Tail Recursion Elimination in Python. Whereas many compiled languages do Tail Call recursion Optimization (TCO). Which means that even if you write the perfect tail-recursive function in python, it will crash on deeply nested lists.
If you need more details on the list flattening algorithm you can refer to the post I linked.
Simple and elegant solution:
def flat(main_list):
res = []
index = []
for main_index in range(len(main_list)):
# Check if element is a String
if isinstance(main_list[main_index], str):
res.append(main_list[main_index])
index.append(main_index)
# Check if element is a List
else:
sub_list = str(main_list[main_index]).replace('[', '').replace(']', '').replace(" ", '').replace("'", '').split(',')
res += sub_list
index += ([main_index] * len(sub_list))
return res, index
this does the job, but if you want it to be just returned then I'll enhance it for you
from pprint import pprint
ar = ["","","",["",[["a","b"],["c"]]],[[["a","b"],["c"]]],[[["d"]]]]
flat = []
indices= []
def squash(arr,indx=-1):
for ind,item in enumerate(arr):
if isinstance(item, list):
squash(item,ind if indx==-1 else indx)
else:
flat.append(item)
indices.append(ind if indx==-1 else indx)
squash(ar)
pprint(ar)
pprint(flat)
pprint(indices)
EDIT
and this is if you don't want to keep the lists in memory and return them
from pprint import pprint
ar = ["","","",["",[["a","b"],["c"]]],[[["a","b"],["c"]]],[[["d"]]]]
def squash(arr,indx=-1,fl=[],indc=[]):
for ind,item in enumerate(arr):
if isinstance(item, list):
fl,indc = squash(item,ind if indx==-1 else indx, fl, indc)
else:
fl.append(item)
indc.append(ind if indx==-1 else indx)
return fl,indc
flat,indices = squash(ar)
pprint(ar)
pprint(flat)
pprint(indices)
I'm not expecting you would need more than 1k recursion depth which is the default setting

Resources