Keys for klepto dir_archive? - klepto

I've created a klepto dir_archive.
On subsequent archive access, how can the archive keys be determined without loading the entire archive into memory?

Something like this?
>>> import klepto as kl
>>> kl.archives.dir_archive()
dir_archive('memo', {}, cached=True)
>>> d = _
>>> d['a'] = 0
>>> d['b'] = 1
>>> d['c'] = 2
>>> d
dir_archive('memo', {'a': 0, 'c': 2, 'b': 1}, cached=True)
>>> d.dump()
>>>
Then restart the session...
>>> import klepto as kl
>>> d = kl.archives.dir_archive()
>>> d
dir_archive('memo', {}, cached=True)
>>> d.archive.keys()
['a', 'c', 'b']
There are also several private methods, if you'd need something peculiar:
>>> d.archive._keydict()
{'a': None, 'c': None, 'b': None}
But, the main point is: you can easily interact with the dir_archive without loading it, by using the archive attribute.

Related

How to do sorting in Python3 for a list dictionary inside?

I have a list of the dictionary as follows:
[{"A":5,"B":10},
{"A":6,"B":13},
{"A":10,"B":5}]
I want to this list in decending order on the value of B. The output should look like this:
[{"A":6,"B":13},
{"A":5,"B":10},
{"A":10,"B":5}]
How to do that?
You can sort lists by the results of applying a function to each element: https://docs.python.org/3.9/library/functions.html#sorted
>>> data = [{"A":5,"B":10},
... {"A":6,"B":13},
... {"A":10,"B":5}]
>>> sorted(data, key=lambda dct: dct["B"], reverse=True)
[{'A': 6, 'B': 13}, {'A': 5, 'B': 10}, {'A': 10, 'B': 5}]

Python Groupby keys list of dictionaries

I have the following for instance:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
and I would like to get a dictionary like this:
x = [{'A': [1,2], 'B': [1,2,3], 'C':[1], 'D': [1]}]
Do you have any idea how I could get this please?
You could use a collections.defaultdict of sets to collect unique values, then convert the final result to a dictionary with values as lists using a dict comprehension:
from collections import defaultdict
lst = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
result = defaultdict(set)
for dic in lst:
for key, value in dic.items():
result[key].add(value)
print({key: list(value) for key, value in result.items()})
Output:
{'A': [1, 2], 'B': [1, 2, 3], 'C': [1], 'D': [1]}
Although its probably better to add your data directly to the defaultdict to begin with, instead of creating a list of singleton dictionaries(don't recommend this data structure) then converting the result.
Using dict.setdefault
Ex:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
res = {}
for i in x:
for k, v in i.items():
res.setdefault(k, set()).add(v)
#or res = [{k: list(v) for k, v in res.items()}]
print(res)
Output:
{'A': {1, 2}, 'B': {1, 2, 3}, 'C': {1}, 'D': {1}}

populating a dictionary of lists behavior [duplicate]

My attempt to programmatically create a dictionary of lists is failing to allow me to individually address dictionary keys. Whenever I create the dictionary of lists and try to append to one key, all of them are updated. Here's a very simple test case:
data = {}
data = data.fromkeys(range(2),[])
data[1].append('hello')
print data
Actual result: {0: ['hello'], 1: ['hello']}
Expected result: {0: [], 1: ['hello']}
Here's what works
data = {0:[],1:[]}
data[1].append('hello')
print data
Actual and Expected Result: {0: [], 1: ['hello']}
Why is the fromkeys method not working as expected?
When [] is passed as the second argument to dict.fromkeys(), all values in the resulting dict will be the same list object.
In Python 2.7 or above, use a dict comprehension instead:
data = {k: [] for k in range(2)}
In earlier versions of Python, there is no dict comprehension, but a list comprehension can be passed to the dict constructor instead:
data = dict([(k, []) for k in range(2)])
In 2.4-2.6, it is also possible to pass a generator expression to dict, and the surrounding parentheses can be dropped:
data = dict((k, []) for k in range(2))
Try using a defaultdict instead:
from collections import defaultdict
data = defaultdict(list)
data[1].append('hello')
This way, the keys don't need to be initialized with empty lists ahead of time. The defaultdict() object instead calls the factory function given to it, every time a key is accessed that doesn't exist yet. So, in this example, attempting to access data[1] triggers data[1] = list() internally, giving that key a new empty list as its value.
The original code with .fromkeys shares one (mutable) list. Similarly,
alist = [1]
data = dict.fromkeys(range(2), alist)
alist.append(2)
print(data)
would output {0: [1, 2], 1: [1, 2]}. This is called out in the dict.fromkeys() documentation:
All of the values refer to just a single instance, so it generally doesn’t make sense for value to be a mutable object such as an empty list.
Another option is to use the dict.setdefault() method, which retrieves the value for a key after first checking it exists and setting a default if it doesn't. .append can then be called on the result:
data = {}
data.setdefault(1, []).append('hello')
Finally, to create a dictionary from a list of known keys and a given "template" list (where each value should start with the same elements, but be a distinct list), use a dictionary comprehension and copy the initial list:
alist = [1]
data = {key: alist[:] for key in range(2)}
Here, alist[:] creates a shallow copy of alist, and this is done separately for each value. See How do I clone a list so that it doesn't change unexpectedly after assignment? for more techniques for copying the list.
You could use a dict comprehension:
>>> keys = ['a','b','c']
>>> value = [0, 0]
>>> {key: list(value) for key in keys}
{'a': [0, 0], 'b': [0, 0], 'c': [0, 0]}
This answer is here to explain this behavior to anyone flummoxed by the results they get of trying to instantiate a dict with fromkeys() with a mutable default value in that dict.
Consider:
#Python 3.4.3 (default, Nov 17 2016, 01:08:31)
# start by validating that different variables pointing to an
# empty mutable are indeed different references.
>>> l1 = []
>>> l2 = []
>>> id(l1)
140150323815176
>>> id(l2)
140150324024968
so any change to l1 will not affect l2 and vice versa.
this would be true for any mutable so far, including a dict.
# create a new dict from an iterable of keys
>>> dict1 = dict.fromkeys(['a', 'b', 'c'], [])
>>> dict1
{'c': [], 'b': [], 'a': []}
this can be a handy function.
here we are assigning to each key a default value which also happens to be an empty list.
# the dict has its own id.
>>> id(dict1)
140150327601160
# but look at the ids of the values.
>>> id(dict1['a'])
140150323816328
>>> id(dict1['b'])
140150323816328
>>> id(dict1['c'])
140150323816328
Indeed they are all using the same ref!
A change to one is a change to all, since they are in fact the same object!
>>> dict1['a'].append('apples')
>>> dict1
{'c': ['apples'], 'b': ['apples'], 'a': ['apples']}
>>> id(dict1['a'])
>>> 140150323816328
>>> id(dict1['b'])
140150323816328
>>> id(dict1['c'])
140150323816328
for many, this was not what was intended!
Now let's try it with making an explicit copy of the list being used as a the default value.
>>> empty_list = []
>>> id(empty_list)
140150324169864
and now create a dict with a copy of empty_list.
>>> dict2 = dict.fromkeys(['a', 'b', 'c'], empty_list[:])
>>> id(dict2)
140150323831432
>>> id(dict2['a'])
140150327184328
>>> id(dict2['b'])
140150327184328
>>> id(dict2['c'])
140150327184328
>>> dict2['a'].append('apples')
>>> dict2
{'c': ['apples'], 'b': ['apples'], 'a': ['apples']}
Still no joy!
I hear someone shout, it's because I used an empty list!
>>> not_empty_list = [0]
>>> dict3 = dict.fromkeys(['a', 'b', 'c'], not_empty_list[:])
>>> dict3
{'c': [0], 'b': [0], 'a': [0]}
>>> dict3['a'].append('apples')
>>> dict3
{'c': [0, 'apples'], 'b': [0, 'apples'], 'a': [0, 'apples']}
The default behavior of fromkeys() is to assign None to the value.
>>> dict4 = dict.fromkeys(['a', 'b', 'c'])
>>> dict4
{'c': None, 'b': None, 'a': None}
>>> id(dict4['a'])
9901984
>>> id(dict4['b'])
9901984
>>> id(dict4['c'])
9901984
Indeed, all of the values are the same (and the only!) None.
Now, let's iterate, in one of a myriad number of ways, through the dict and change the value.
>>> for k, _ in dict4.items():
... dict4[k] = []
>>> dict4
{'c': [], 'b': [], 'a': []}
Hmm. Looks the same as before!
>>> id(dict4['a'])
140150318876488
>>> id(dict4['b'])
140150324122824
>>> id(dict4['c'])
140150294277576
>>> dict4['a'].append('apples')
>>> dict4
>>> {'c': [], 'b': [], 'a': ['apples']}
But they are indeed different []s, which was in this case the intended result.
You can use this:
l = ['a', 'b', 'c']
d = dict((k, [0, 0]) for k in l)
You are populating your dictionaries with references to a single list so when you update it, the update is reflected across all the references. Try a dictionary comprehension instead. See
Create a dictionary with list comprehension in Python
d = {k : v for k in blah blah blah}
You could use this:
data[:1] = ['hello']

Dictionary copy() - is shallow deep sometimes?

According to the official docs the dictionary copy is shallow, i.e. it returns a new dictionary that contains the same key-value pairs:
dict1 = {1: "a", 2: "b", 3: "c"}
dict1_alias = dict1
dict1_shallow_copy = dict1.copy()
My understanding is that if we del an element of dict1 both dict1_alias & dict1_shallow_copy should be affected; however, a deepcopy would not.
del dict1[2]
print(dict1)
>>> {1: 'a', 3: 'c'}
print(dict1_alias)
>>> {1: 'a', 3: 'c'}
But dict1_shallow_copy 2nd element is still there!
print(dict1_shallow_copy)
>>> {1: 'a', 2: 'b', 3: 'c'}
What am I missing?
A shallow copy means that the elements themselves are the same, just not the dictionary itself.
>>> a = {'a':[1, 2, 3], #create a list instance at a['a']
'b':4,
'c':'efd'}
>>> b = a.copy() #shallow copy a
>>> b['a'].append(2) #change b['a']
>>> b['a']
[1, 2, 3, 2]
>>> a['a'] #a['a'] changes too, it refers to the same list
[1, 2, 3, 2]
>>> del b['b'] #here we do not change b['b'], we change b
>>> b
{'a': [1, 2, 3, 2], 'c': 'efd'}
>>> a #so a remains unchanged
{'a': [1, 2, 3, 2], 'b': 4, 'c': 'efd'}1

nested dictionary based on 3 dataframe columns

im trying to build a nested dictionary based on 3 pandas df columns:
dataframe: stops
columns: 'direction' (1-2) ,'stop_num'(1-23 if the direction is 1 and 100-2300 if direction is 2),'name_eng'
what i was trying to do is:
dct = {x: {y:z} for x, y, z in zip(stops['direction'],stops['name_eng'],stops['stop_num'])}
the result i get is a nested dictionary indeed but for unknown reason i get only the last value in y:z so the dictionary look like:
{1:{1:'aaa'},'2:{100:'bbb'}}
any idea what am i doing wrong?
what i need is a nested dictionary with two dictionaries for each direction.
thanks!
Imagine your columns are:
1 a 1a
1 b 1b
2 a 2a
2 b 2b
Now, try your code:
>>> {x: {y:z} for x, y, z in zip([1,1,2,2], ['a', 'b', 'a', 'b'], ['1a', '1b', '2a', '2b'])}
{1: {'b': '1b'}, 2: {'b': '2b'}}
You have a loop over the tuples: (1, 'a', '1a'), (1, 'b', '1b'), (2, 'a', '2a'), (2, 'b', '2b').
The first element of the tuple is the "main" key of your dictionary. Thus, the dict is {1: {'a':'1a'}} after the first tuple.
Then comes (1, 'b', '1b'). The value of the main key 1 is overwritten and the dict becomes: {1: {'b':'1b'}}.
The next steps are: {1: {'b':'1b'}, 2: {'a': '2a'}} and {1: {'b': '1b'}, 2: {'b': '2b'}}
To avoid the overwrite, you can do:
>>> d = {}
>>> for x, y, z in zip([1,1,2,2], ['a', 'b', 'a', 'b'], ['1a', '1b', '2a', '2b']):
... d.setdefault(x, {}).update({y:z})
...
>>> d
{1: {'a': '1a', 'b': '1b'}, 2: {'a': '2a', 'b': '2b'}}
The idea is to create a new dict for every new main key (setdefault(..., {})) and to update the dict associated with the main key (update({y:z})).
If you want a dict comprehension, this one will work:
>>> {x: {y:z for k, y, z in zip([1,1,2,2], ['a', 'b', 'a', 'b'], ['1a', '1b', '2a', '2b']) if k==x} for x in set([1,1,2,2])}
{1: {'a': '1a', 'b': '1b'}, 2: {'a': '2a', 'b': '2b'}}
But it's far less efficient than the for loop because you loop once over the first column to get the main keys, then once again over all the rows, for every main key.

Resources