How would you use asyncpg with FastAPI to map returned values from a select query to pydantic models for output and validation? - python-3.x

I want to use FastAPI without an ORM (using asyncpg) and map the returned values from a select query to a pydantic model. This way the returned values are validated with pydantic and the response that is returned is structured like the pydantic model/schema.
I’ve tried looking for documentation on this but it’s pretty hard to find/not clear. I’d appreciate any help!

Every pydantic model inherits a couple of utility helpers to create objects. One is parse_obj which takes a dict and creates the model object from that.
parse_obj: this is very similar to the __init__ method of the model, except it takes a dict rather than keyword arguments. If the object passed is not a dict a ValidationError will be raised.
From the example on the linked section above:
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: datetime = None
m = User.parse_obj({'id': 123, 'name': 'James'})
print(m)
#> id=123 signup_ts=None name='James'
You might be able to give parse_obj a Record directly since it implements dict-like accessors, so just try it and see if it works. If not you can use dict(<row record from asyncpg>) to convert the record to an actual dict.

Related

What is the correct type hint to use when exporting a pydantic model as a dict?

I'm writing an abstraction module which validates an excel sheet against a pydantic schema and returns the row as a dict using dict(MyCustomModel(**sheet_row))
. I would like to use type hinting so any function that uses the abstraction methods gets a type hint for the returned dictionary with its keys instead of just getting an unhelpful dict. Basically I'd like to return the keys of the dict that compose the schema so I don't have to keep referring to the schema for its fields and to catch any errors early on.
My current workaround is having my abstraction library return the pydantic model directly and type hint using the Model itself. This means every field has to be accessed using a dot notation instead of accessing it like a regular dictionary. I cannot annotate the dict has being the model itself as its a dict, not the actual pydantic model which has some extra attributes as well.
I tried type hinting with the type MyCustomModel.__dict__(). That resulted in the error TypeError: Parameters to generic types must be types. Got mappingproxy({'__config__': <class 'foo.bar.Config'>, '__fields__': {'lab.. Is there a way to send a type hint about the fields in the schema, but as a dictionary? I don't omit any keys during the dict export. All the fields in the model is present in the final dict being returned
I am going to try and abstract that question and create minimal reproducible example for you.
Question
Consider this working example:
from typing import Any
from pydantic import BaseModel
class Foo(BaseModel):
x: str
y: int
def validate(data: dict[str, Any], model: type[BaseModel]) -> dict[str, Any]:
return dict(model.parse_obj(data))
def test() -> None:
data = {"x": "spam", "y": "123"}
validated = validate(data, Foo)
print(validated)
# reveal_type(validated["x"])
# reveal_type(validated["y"])
if __name__ == "__main__":
test()
The code works fine and outputs {'x': 'spam', 'y': 123} as expected. But if you uncomment the reveal_type lines and run mypy over it, obviously the type it sees is just Any for both.
Is there a way to annotate validate, so that a type checker knows, which keys will be present in the returned dictionary, based on the model provided to it?
Answer
Python dictionaries have no mechanism built into them for distinguishing their type via specific keys. The generic dict type is parameterized by exactly two type parameters, namely the key type and the value type.
You can utilize the typing.TypedDict class to define a type based on the specific keys of a dictionary. However (as pointed out by #hernán-alarcón in the comments) the __dict__ method still returns just a dict[str, Any]. You can always cast the output of course and for this particular Foo model this would work:
from typing import Any, TypedDict, cast
from pydantic import BaseModel
class Foo(BaseModel):
x: str
y: int
class FooDict(TypedDict):
x: str
y: int
def validate(data: dict[str, Any], model: type[BaseModel]) -> FooDict:
return cast(FooDict, dict(model.parse_obj(data)))
def test() -> None:
data = {"x": "spam", "y": "123"}
validated = validate(data, Foo)
print(validated)
reveal_type(validated["x"]) # "builtins.str"
reveal_type(validated["y"]) # "builtins.int"
if __name__ == "__main__":
test()
But it is not very helpful, if validate should be able to deal with any model, not just Foo.
The easiest way to generalize this that I can think of is to make your own base model class that is generic in terms of the corresponding TypedDict. Binding the type argument in a dedicated private attribute should be enough. You won't actually have to set it or interact with it at any point. It is enough to specify it, when you subclass your base class. Here is a working example:
from typing import Any, Generic, TypeVar, TypedDict, cast
from pydantic import BaseModel as PydanticBaseModel, PrivateAttr
T = TypeVar("T")
class BaseModel(PydanticBaseModel, Generic[T]):
__typed_dict__: type[T] = PrivateAttr(...)
class FooDict(TypedDict):
x: str
y: int
class Foo(BaseModel[FooDict]):
x: str
y: int
def validate(data: dict[str, Any], model: type[BaseModel[T]]) -> T:
return cast(T, model.parse_obj(data).dict())
def test() -> None:
data = {"x": "spam", "y": "123"}
validated = validate(data, Foo)
print(validated)
reveal_type(validated["x"]) # "builtins.str"
reveal_type(validated["y"]) # "builtins.int"
reveal_type(validated) # "TypedDict('FooDict', {'x': builtins.str, 'y': builtins.int})"
if __name__ == "__main__":
test()
This works well enough to convey the dictionary keys and corresponding types.
If you are wondering, whether there is a way to just dynamically infer the TypedDict rather than just duplicating the model fields manually, the answer is no.
Static type checkers do not execute your code, they just read it.
This brings me to the final consideration. I don't know, why you would even want to use a dictionary over a model instance in the first place. It seems that for the purposes of dealing with structured data, the model is superior in every aspect, if you already are using Pydantic anyway.
The fact that you access the fields as attributes (via dot-notation) is a feature IMHO and not a drawback of this approach. If you for some reason do need to have dynamic attribute access via field names as strings, you can always just use getattr on the model instance.

Can't use Pydantic model attributes on type hinting

Like I used to do with FastAPI routes, I want to make a function that is expecting a dict. I want to type hint like in FastAPI with a Pydantic model.
Note that I am just using FastAPI as a reference here and this app serves a total different purpose.
What I did:
models.py
from pydantic import BaseModel
class Mymodel(BaseModel):
name:str
age:int
main.py
def myfunc(m:Mymodel):
print(m)
print(m.name)
myfunc({"name":"abcd","age":3})
It prints m as a normal dict and not Mymodel and m.name just throws an AttributeError.
I don't understand why it is behaving like this because the same code would work in FastAPI. Am I missing something here? What should I do to make this work.
I am expecting a dict arg in the func, I want to type hint with a class inherited from pydantic BaseModel. Then I want to acccess the attributes of that class.
I don't want to do:
def myfunc(m):
m = Mymodel(**m)
Thank You.
from pydantic import BaseModel
from pydantic import validate_arguments
class Mymodel(BaseModel):
name:str
age:int
#validate_arguments
def myfunc(m:Mymodel):
print(m)
print(m.name)
myfunc({"name":"abcd","age":3})
This might be what you are looking for: https://pydantic-docs.helpmanual.io/usage/validation_decorator/
Since you pass a dict to your custom function, the attribute should be accessed in the following way:
print(m['name'])
# or
print(m.get('name'))
Otherwise, to use m.name instead, you need to parse the dict to the corresponding Pydantic model, before passing it to the function, as shwon below:
data = {"name":"abcd", "age":3}
myfunc(Mymodel(**data))
# or
myfunc(Mymodel.parse_obj(data))
The reason that passing {"name":"abcd", "age":3} in FastAPI and later accessing the attributes using the dot operator (e.g., m.name) works, is that FastAPI does the above parsing and validation internally, as soon as a request arrives. This is the reason that you can then convert it back to a dictionary in your endpoint, using m.dict(). Try, for example, passing an incorrect key, e.g., myfunc(Mymodel(**{"name":"abcd","MYage":3}))—you would get a field required (type=value_error.missing) error (as part of Pydantic's Error Handling), similar to what FastAPI would return (as shown below), if a similar request attempted to go through (you could also test that through Swagger UI autodocs at http://127.0.0.1:8000/docs). Otherwise, any dictionary passed by the user (in the way you show in the question) would go through without throwing an error, in case it didn't match the Pydantic model.
{
"detail": [
{
"loc": [
"body",
"age"
],
"msg": "field required",
"type": "value_error.missing"
}
]
}
You could alternatively use Pydantic's validation decorator (i.e., #validate_arguments) on your custom function. As per the documentation:
The validate_arguments decorator allows the arguments passed to a
function to be parsed and validated using the function's annotations
before the function is called. While under the hood this uses the same
approach of model creation and initialisation; it provides an
extremely easy way to apply validation to your code with minimal
boilerplate.
Example:
from pydantic import validate_arguments
from pydantic import BaseModel
class Model(BaseModel):
name: str
age: int
#validate_arguments
def myfunc(m: Model):
print(m)
print(m.name)
myfunc({"name":"abcd","age":3})

Can Marshmallow auto-convert dot-delimited fields to nested JSON/dict in combination with unknown=EXCLUDE?

In trying to load() data with field names which are dot-delimited, using unknown=INCLUDE auto-converts this to nested dicts (which is what I want), however I'd like to do this with unknown=EXCLUDE as my data has a lot of properties I don't want to deal with.
It appears that with unknown=EXCLUDE, this auto-conversion does not happen and the dot-delimited field itself is passed to the schema, which of course is not recognized. This is confirmed by not using the unknown= param at all, which raises a ValidationError.
Is it possible to combine unknown=EXCLUDE and still get nested data? Or is there a better way to deal with this situation?
Thanks in advance!
# using marshmallow v3.7.1
from marshmallow import Schema, fields, INCLUDE, EXCLUDE
data = {'LEVEL1.LEVEL2.LEVEL3': 'FooBar'}
class Level3Schema(Schema):
LEVEL3 = fields.String()
class Level2Schema(Schema):
LEVEL2 = fields.Nested(Level3Schema)
class Level1Schema(Schema):
LEVEL1 = fields.Nested(Level2Schema)
schema = Level1Schema()
print(schema.load(data, unknown=INCLUDE))
# prints: {'LEVEL1': {'LEVEL2': {'LEVEL3': 'FooBar'}}}
print(schema.load(data, unknown=EXCLUDE))
# prints: {}
print(schema.load(data))
# raises: marshmallow.exceptions.ValidationError: {'LEVEL1.LEVEL2.LEVEL3': ['Unknown field.']}

Creating custom component in SpaCy

I am trying to create SpaCy pipeline component to return Spans of meaningful text (my corpus comprises pdf documents that have a lot of garbage that I am not interested in - tables, headers, etc.)
More specifically I am trying to create a function that:
takes a doc object as an argument
iterates over the doc tokens
When certain rules are met, yield a Span object
Note I would also be happy with returning a list([span_obj1, span_obj2])
What is the best way to do something like this? I am a bit confused on the difference between a pipeline component and an extension attribute.
So far I have tried:
nlp = English()
Doc.set_extension('chunks', method=iQ_chunker)
####
raw_text = get_test_doc()
doc = nlp(raw_text)
print(type(doc._.chunks))
>>> <class 'functools.partial'>
iQ_chunker is a method that does what I explain above and it returns a list of Span objects
this is not the results I expect as the function I pass in as method returns a list.
I imagine you're getting a functools partial back because you are accessing chunks as an attribute, despite having passed it in as an argument for method. If you want spaCy to intervene and call the method for you when you access something as an attribute, it needs to be
Doc.set_extension('chunks', getter=iQ_chunker)
Please see the Doc documentation for more details.
However, if you are planning to compute this attribute for every single document, I think you should make it part of your pipeline instead. Here is some simple sample code that does it both ways.
import spacy
from spacy.tokens import Doc
def chunk_getter(doc):
# the getter is called when we access _.extension_1,
# so the computation is done at access time
# also, because this is a getter,
# we need to return the actual result of the computation
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
return [first_half, secod_half]
def write_chunks(doc):
# this pipeline component is called as part of the spacy pipeline,
# so the computation is done at parse time
# because this is a pipeline component,
# we need to set our attribute value on the doc (which must be registered)
# and then return the doc itself
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
doc._.extension_2 = [first_half, secod_half]
return doc
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser", "ner"])
Doc.set_extension("extension_1", getter=chunk_getter)
Doc.set_extension("extension_2", default=[])
nlp.add_pipe(write_chunks)
test_doc = nlp('I love spaCy')
print(test_doc._.extension_1)
print(test_doc._.extension_2)
This just prints [I, love spaCy] twice because it's two methods of doing the same thing, but I think making it part of your pipeline with nlp.add_pipe is the better way to do it if you expect to need this output on every document you parse.

How to define the same field for load_only and dump_only params at the Marshmallow scheme?

I am trying to build a marshmallow scheme to both load and dump data. And I get everything OK except one field.
Problem description
(If you understand the problem, you don't have to read this).
For load data its type is Decimal. And I used it like this before. Now I want to use this schema for dumping and for that my flask API responses with: TypeError: Object of type Decimal is not JSON serializable. OK, I understand. I changed the type to Float. Then my legacy code started to get an exception while trying to save that field to database (it takes Decimal only). I don't want to change the legacy code so I looked for any solution at the marshmallow docs and found load_only and dump_only params. It seems like those are what I wanted, but here is my problem - I want to set them to the same field. So I just wondered if I can define both fields and tried this:
class PaymentSchema(Schema):
money = fields.Decimal(load_only=True)
money = fields.Float(dump_only=True)
I have been expected for a miracle, of course. Actually I was thinking that it will skip first definition (correctly, re-define it). What I got is an absence of the field at all.
Workaround solution
So I tried another solution. I created another schema for dump and inherit it from the former schema:
class PaymentSchema(Schema):
money = fields.Decimal(load_only=True)
class PaymentDumpSchema(PaymentSchema):
money = fields.Float(dump_only=True)
It works. But I wonder if there's some another, native, "marshmallow-way" solution for this. I have been looking through the docs but I can't find anything.
You can use the marshmallow decorator #pre_load in this decorator you can do whatever you want and return with your type
from marshmallow import pre_load
import like this and in this you will get your payload and change the type as per your requirement.
UPD: I found a good solution finally.
NEW SOLUTION
The trick is to define your field in load_fields and dump_fields inside __init__ method.
from marshmallow.fields import Integer, String, Raw
from marshmallow import Schema
class ItemDumpLoadSchema(Schema):
item = Raw()
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if not (self.only and 'item' not in self.only) and \
not (self.exclude and 'item' in self.exclude):
self.load_fields['item'] = Integer(missing=0)
self.dump_fields['item'] = String()
Usage:
>>> ItemDumpLoadSchema().load({})
{'item': 0}
>>> ItemDumpLoadSchema().dump({'item': 0})
{'item': '0'}
Don't forget to define field in a schema with some field (Raw in my example) - otherwise it may raise an exception in some cases (e.g. using of only and exclude keywords).
OLD SOLUTION
A little perverted one. It based on #prashant-suthar answer. I named load field with suffix _load and implemented #pre_load, #post_load and error handling.
class ArticleSchema(Schema):
id = fields.String()
title = fields.String()
text = fields.String()
class FlowSchema(Schema):
article = fields.Nested(ArticleSchema, dump_only=True)
article_load = fields.Int(load_only=True)
#pre_load
def pre_load(self, data, *args, **kwargs):
if data.get('article'):
data['article_load'] = data.pop('article')
return data
#post_load
def post_load(self, data, *args, **kwargs):
if data.get('article_load'):
data['article'] = data.pop('article_load')
return data
def handle_error(self, exc, data, **kwargs):
if 'article_load' in exc.messages:
exc.messages['article'] = exc.messages.pop('article_load')
raise exc
Why the old solution is not a good solution?
It doesn't allow to inheritate schemas with different handle_error methods defined. And you have to name pre_load and post_load methods with different names.
pass data_key argument to the field definition
Documentation mentions, data_key parameter can be used along with dump_only or load_only to be able to have same field with different functionality.
So you can write your schema as...
class PaymentSchema(Schema):
decimal_money = fields.Decimal(data_key="money", load_only=True)
money = fields.Float(dump_only=True)
This should solve your problem. I am using data_key for similar problem in marshmallow with SQLAlchemyAutoSchema and this fixed my issue.
Edit
Note: The key in ValidationError.messages (error messages) will be decimal_money by default. You may tweak the handle_error method of Schema class to replace decimal_money with money but it is not recommended as you yourself may not be able to differentiate between the error messages fields.
Thanks.

Resources