Colander: How do I handle null values for nested schema - python-3.5

Using colander 1.5.1, if I pass null to an attribute defined by a nested schema:
class ChildSchema(colander.Schema):
a = colander.SchemaNode(colander.Integer(), missing=None)
b = colander.SchemaNode(colander.Integer(), missing=None)
class ParentSchema(colander.Schema):
c = colander.SchemaNode(colander.Integer(), missing=None)
d = ChildSchema(missing=None)
example json:
{
"c": 1,
"d": null
}
Then I get this error when deserialising:
"\"None\" is not a mapping type: Does not implement dict-like functionality."
Not passing the attribute d functions as expected, and deserialises to None. How do I correctly handle deserialising a null value passed to a nested schema? I would expect the behaviour to return None, based on the documentation.
Deserialization Combinations

Please consider the following:
You need the following imports.
import colander
from colander import SchemaType, Invalid, null
Below is the same as your Child Schema.
class ChildSchema(colander.Schema):
a = colander.Schema(colander.Integer(), missing=None)
b = colander.Schema(colander.Integer(), missing=None)
Below is where all the magic occurs. We create our own type. Note that it may lack some of the basic functionalities that the built-ins in colander SchemaTypes may offer (like serialize). If the recieved object is null or None then it is returned with not changes. If it is not null or None and not a dictionary it will raise an error, and if it is a dictionary it will be serialized with your ParentSchema, even if the attributes of the Parent are null or None ({"d": null}).
class MyType(SchemaType):
def deserialize(self, node, cstruct):
if cstruct is null:
return null
if cstruct is None:
return None
if not isinstance(cstruct, dict):
raise Invalid(node, '%r is not an object' % cstruct)
else:
return ChildSchema().deserialize(cstruct)
We create the Parent Schema using the magic type:
class ParentSchema(colander.Schema):
c = colander.SchemaNode(colander.Integer(), missing=None)
d = colander.SchemaNode(MyType(), missing=None)
"d" will be deserialize as u wish. Now lets see some examples of its usage:
schema = ParentSchema()
Example 1. "d" is null (The main question)
schema.deserialize({"c": 1, "d": null})
Output 1
{"c": 1, "d": None}
Example 2. "d" is None
schema.deserialize({"c": 1, "d": None})
Output 2
{"c": 1, "d": None}
Example 3. Normal behaviour
schema.deserialize({'c': 1, 'd': {'a': 1}})
Output 3.
{'c': 1, 'd': {'a': 1, 'b': None}}
Example 5. Error "d" is not dict
schema.deserialize({'c': 1, 'd': [] })
Output 5
# Invalid: {'d': '[] is not an object'}
Example 6. Error validator not number
schema.deserialize({'c': 1, 'd': {'a': "foobar"}})
Output 6
# Invalid: {'a': u'"foobar" is not a number'}
To write this answer I used https://docs.pylonsproject.org/projects/colander/en/latest/extending.html as a source.

Related

How to access to value from multiple keys for a dictionary in python3?

Have a multidimensional dictionary, by example:
a = {
'b': {
'c': {
'd': 1
}
}
}
How to access to value dynamically from a specific path on function?, by example:
def get_value(path):
???
How to send the keys and specific dimension?, by example, using 'b.c.d' or ['b','c','d'].
You can use recursion:
def get_value(dct, path):
if len(path) == 1:
return dct[path[0]]
return get_value(dct[path[0]], path[1:])
a = {"b": {"c": {"d": 1}}}
print(get_value(a, ["b", "c", "d"]))
Prints:
1

How to check if multiple keys exists in nested (all) dictionary?

To check if 'b' & 'c' exists in 'a' & 'b'.
test = {
'a': {
'b': [1234],
'c': 'some_value'
},
'd': {
'b': [5678],
'c': ''
}
}
Approach1: works as below but not great implementation if nested dictionary are huge in number. And also, You can't exactly notify which element doesn't exist. Let's say, 'c' not in 'a' , 'b' not in 'd' & 'c' not in 'd' . In this case, it fails at second statement (but it doesn't notify that 3rd & 4th statements also fail). I need to get, which all doesn't exist.
try:
v1 = test['a']['b']
v2 = test['a']['c']
v3 = test['d']['b']
v4 = test['d']['c']
except Exception as err:
print(err)
Approach2:
for k,v in test.items():
if 'b' not in v:
print("'b' doesn't exist in {}".format(test[k][v]))
if 'c' not in v:
print("'c' doesn't exist in {}".format(test[k][v]))
Approach1 and Approach2 seem to be not great. Any other better ways to handle it ?
If there are only two levels of nest, could you please try to count occurence of keys in lower-level dicts?
For example:
counter = {}
for el in test.keys():
for subkey in test.get(el).keys():
if subkey not in counter.keys():
counter[subkey] = 1.0
else:
counter[subkey] += 1.0
it'll return
{'b': 2.0, 'c': 2.0}
based on that you can identify duplicated values in your nested keys.
You could then use set to saw on which keys duplicates exist:
dupe keys
counter = {k : v for k, v in counter.items() if v > 1}
#get only values with dupe records
{k:v for k, v in test.items() if len(set(counter.keys()).intersection(v)) > 0}
> {'a': {'b': [1234], 'c': 'some_value'}, 'd': {'b': [5678], 'c': ''}}

Find the intersection of dict of dicts based on the rules in python3.x

I have two dictionaries as given below and want to find the intersection of dictionaries based on some logic.
dict1= {"1":{"score1": 1.099, "score2":0.45},
"2": {"score2": 0.099, "score3":1.45},
"3": {"score2": 10, "score3":10.45}}
dict2= {"1":{"score6": 1.099, "score2":0.45},
"2": {"score2": 10, "score3":10.45},
"4": {"score5": 8, "score8":15}}
I want to create the dictionary based on the given two dictionaries based on the below rules:
1.union of the two dicitonaries based on the outer key
if outer key is common in both the dictionaries then in the nested key-value pair show only the common key with highest value across both the dictionaries.
result_dict = {"1":{"score2":0.45},
"3": {"score2": 10, "score3":10.45},
"2": {"score2": 10, "score3":10.45},
"4": {"score5": 8, "score8":15}}```
First off, thanks for providing concrete examples of what your inputs are like and what you'd like the output to look like.
There may well be more efficient ways of doing this, but since there's no mention of any constraints on performance, my first instinct was to turn to Python's set operations to make things a little simpler:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
dict1 = {
"1": {
"score1": 1.099,
"score2": 0.45
},
"2": {
"score2": 0.099,
"score3": 1.45
},
"3": {
"score2": 10,
"score3": 10.45
}
}
dict2 = {
"1": {
"score6": 1.099,
"score2": 0.45
},
"2": {
"score2": 10,
"score3": 10.45
},
"4": {
"score5": 8,
"score8": 15
}
}
result_dict = {
"1": {
"score2": 0.45
},
"3": {
"score2": 10,
"score3": 10.45
},
"2": {
"score2": 10,
"score3": 10.45
},
"4": {
"score5": 8,
"score8": 15
}
}
def weird_union(d1, d2):
"""Applies the logic in OP's question
Args:
d1 (dict): dict with one level of nested dicts as values
d2 (dict): dict with one level of nested dicts as values
Returns: dict
"""
result = {}
k1, k2 = set(d1.keys()), set(d2.keys())
# no collisions-- easy case
for k in k1.symmetric_difference(k2):
result[k] = d1[k] if k in d1 else d2[k]
# key appears in both dicts
for k in k1.intersection(k2):
_k1, _k2 = set(d1[k].keys()), set(d2[k].keys())
result[k] = {
key: max([d1[k][key], d2[k][key]])
for key in _k1.intersection(_k2)
}
return result
test = weird_union(dict1, dict2)
assert result_dict == test
print('Test passed.')
The basic idea is to treat the disjoint and the intersection cases separately. Hope this helps.
Update in response to comment:
In the future, please provide this sort of context up front; an operation on two dictionaries is rather different than an operation on an arbitrary number of inputs.
Here's one way to do it:
def invert_dicts(*dicts):
""" Takes multiple dicts and returns a dict mapping
key to dict index. E.g.,
invert_dicts(
{'a': 1, 'b': 2},
{'a': 3, 'c': 4}
)
returns
{'a': [0, 1], 'b': [0], 'c': [1]}
"""
key_map = {}
for i, d in enumerate(dicts):
for k in d.keys():
key_map.setdefault(k, []).append(i)
return key_map
def weird_n_union(*dicts):
"""Applies the logic in OP's question to an arbitrary number of inputs
>>> weird_n_union(d1, d2, ..., dn)
Args:
*dicts (dict): dictionaries w/one level of nested dicts as values
Returns: dict
"""
result = {}
# dict mapping key to list of dict index in `dicts` containing key
key_map = invert_dicts(*dicts)
for k in key_map:
# no outer key collision
if len(key_map[k]) == 1:
result[k] = dicts[key_map[k][0]][k]
# outer key collision
else:
# unclear what should happen in the case where:
# - there is an outer key collision
# - there are no shared sub-keys
#
# this implementation assumes that in that case, the value for k is {}
result.setdefault(k, {})
sub_dicts = tuple(dicts[i][k] for i in key_map[k])
# map keys in `sub_dicts` to indices for `dicts` containing key
sub_key_map = invert_dicts(*sub_dicts)
# contains elements of (k, v), where k appears in > 1 sub-dicts
shared_keys_only = filter(lambda kv: len(kv[1]) > 1,
sub_key_map.items())
# update result with the max value for each shared key
for kv in shared_keys_only:
max_ = max(((kv[0], sub_dicts[i][kv[0]]) for i in kv[1]),
key=lambda x: x[1])
result[k].update({max_[0]: max_[1]})
return result
Tried to annotate to make it a bit clear how things work. Hopefully this works for your use case.

Print nested dictionary in python3

How can I print a nested python dictionary in a specific format?
So, my dictionary is looks like this:
dictionary = {'Doc1':{word1: 3, word2: 1}, 'Doc2':{word1: 1, word2: 14, word3: 3}, 'Doc3':{word1: 2}}
I tried the following way:
for x, y in dictionary.items():
print(x,":", y)
But it will printL`
Doc1: {word1:3, word2: 1}
Doc2: {word1:1, word2:14, word3:3}
Doc3: {word1:2}
How to get rid of the bracket and print the plain information?
I want to print on the following format:
Doc1: word1:3; word2:1
Doc2: word1:1; word2:14; word3: 3
Doc3: word1:2;
:
in your case 'y' is a dict, so if you want to print it differently you can override the repr (representation of the object) or dict.
alternatively you can use some recursion here
def print_d(dd):
if type(dd) != dict:
return str(dd)
else:
res = []
for x,y in dd.items():
res.append(''.join((str(x),':',print_d(y))))
return '; '.join(res)
if __name__=='__main__':
dictionary = {'Doc1':{'word1': 3, 'word2': 1}, 'Doc2':{'word1': 1, 'word2': 14, 'word3': 3}, 'Doc3':{'word1': 2}}
for x, y in dictionary.items():
print(x,": ", print_d(y))
Aside from the fact that your original dictionary declaration is not valid python unless each word is a defined variable, this seems to work:
import json
print(json.dumps(dictionary).replace("{","").replace(',','').replace("}","\n").replace('"',''))
Result:
Doc1: word1: 3 word2: 1
Doc2: word1: 1 word2: 14 word3: 3
Doc3: word1: 2

How do I create a default dictionary of dictionaries

I am trying to write some code that involves creating a default dictionary of dictionaries. However, I have no idea how to initialise/create such a thing. My current attempt looks something like this:
from collections import defaultdict
inner_dict = {}
dict_of_dicts = defaultdict(inner_dict(int))
The use of this default dict of dictionaries is to for each pair of words that I produce from a file I open (e.g. [['M UH M', 'm oo m']] ), to set each segment of the first word delimited by empty space as a key in the outer dictionary, and then for each segment in the second word delimited by empty space count the frequency of that segment.
For example
[['M UH M', 'm oo m']]
(<class 'dict'>, {'M': {'m': 2}, 'UH': {'oo': 1}})
Having just run this now it doesn't seem to have output any errors, however I was just wondering if something like this will actually produce a default dictionary of dictionaries.
Apologies if this is a duplicate, however previous answers to these questions have been confusing and in a different context.
To initialise a defaultdict that creates dictionaries as its default value you would use:
d = defaultdict(dict)
For this particular problem, a collections.Counter would be more suitable
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] += 1
>>> print(d)
defaultdict(collections.Counter,
{'M': Counter({'m': 2}), 'UH': Counter({'oo': 1})})
Edit
You expressed interest in a comment about the equivalent without a Counter. Here is the equivalent using a plain dict
>>> from collections import defaultdict
>>> d = defaultdict(dict)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] = d[a].get(b, 0) + 1
>>> print(d)
defaultdict(dict, {'M': {'m': 2}, 'UH': {'oo': 1}})
You also could a use a normal dictionary and its setdefault method.
my_dict.setdefault(key, default) will look up my_dict[key] and ...
... if the key already exists, return its current value without modifying it, or ...
... assign the default value (my_dict[key] = default) and then return that.
So you can call my_dict.setdefault(key, {}) always when you want to get a value from your outer dictionary instead of the normal my_dict[key] to retrieve either the real value assigned with this key if it#s present, or to get a new empty dictionary as default value which gets automatically stored into your outer dictionary as well.
Example:
outer_dict = {"M": {"m": 2}}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {}}
# inner_dict = {}
inner_dict["oo"] = 1
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict["xy"] = 3
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1, "xy": 3}}
# inner_dict = {"oo": 1, "xy": 3}
This way you always get a valid inner_dict, either an empty default one or the one that's already present for the given key. As dictionaries are mutable data types, modifying the returned inner_dict will also modify the dictionary inside outer_dict.
The other answers propose alternative solutions or show you can make a default dictionary of dictionaries using d = defaultdict(dict)
but the question asked how to make a default dictionary of default dictionaries, my navie first attempt was this:
from collections import defaultdict
my_dict = defaultdict(defaultdict(list))
however this throw an error: *** TypeError: first argument must be callable or None
so my second attempt which works is to make a callable using the lambda key word to make an anonymous function:
from collections import defaultdict
my_dict = defaultdict(lambda: defaultdict(list))
which is more concise than the alternative method using a regular function:
from collections import defaultdict
def default_dict_maker():
return defaultdict(list)
my_dict = defaultdict(default_dict_maker)
you can check it works by assigning:
my_dict[2][3] = 5
my_dict[2][3]
>>> 5
or by trying to return a value:
my_dict[0][0]
>>> []
my_dict[5]
>>> defaultdict(<class 'list'>, {})
tl;dr
this is your oneline answer my_dict = defaultdict(lambda: defaultdict(list))

Resources