SQLAthanor: serialize to json only specific fields - python-3.x

Is there a way to serialize a SQLAlchemy model including only specific fields using SQLAthanor? The documentation doesn't mention it, so the only way that I figured out is to filter the outcome manually.
So, this line with sqlathanor
return jsonify([user.to_dict() for user in users for k, v in user.to_dict().items()
if k in ['username', 'name', 'surname', 'email']])
is equivalent to this one using Marshmallow
return jsonify(SchemaUser(only=('username', 'name', 'surname', 'email')).dump(users, many=True))
Once again, is there a built-in method in SQLAthanor to do this?

Adapting my answer from the related Github issue:
The only way that you can change the list of serialized fields without adjusting the instance’s configuration is to manually adjust the results of to_<FORMAT>(). Your code snippet is one way to do that, although for JSON and YAML you can also supply a custom serialize_function which accepts the dict, processes it, and serializes to JSON or YAML as appropriate:
import simplejson as json
def my_custom_serializer(value, **kwargs):
filtered_dict = {}
filtered_dict['username'] = value['username']
# repeat pattern for other fields
return json.dumps(filtered_dict)
json_result = user.to_json(serialize_function = my_custom_serializer)`
Both approaches are effectively the same, but the serialize_function approach gives you more flexibility for more complex adjustments to your serialized output and (I think) easier to read/maintain code (though if all your doing is adjusting the fields included, your snippet is already quite readable).
You can generalize the serialize_function as well. So if you want to give it a list of fields to include, just include them as a keyword argument in to_json():
def my_custom_serializer(value, **kwargs):
filter_fields = kwargs.pop(“filter_fields”, None)
result = {}
for field in filter_fields:
result[field] = value.get(field, None)
return json.dumps(result)
result = [x.to_json(serialize_funcion = my_custom_serializer, filter_fields = ['username', 'name', 'surname', 'email']) for x in users)

Related

Can Marshmallow auto-convert dot-delimited fields to nested JSON/dict in combination with unknown=EXCLUDE?

In trying to load() data with field names which are dot-delimited, using unknown=INCLUDE auto-converts this to nested dicts (which is what I want), however I'd like to do this with unknown=EXCLUDE as my data has a lot of properties I don't want to deal with.
It appears that with unknown=EXCLUDE, this auto-conversion does not happen and the dot-delimited field itself is passed to the schema, which of course is not recognized. This is confirmed by not using the unknown= param at all, which raises a ValidationError.
Is it possible to combine unknown=EXCLUDE and still get nested data? Or is there a better way to deal with this situation?
Thanks in advance!
# using marshmallow v3.7.1
from marshmallow import Schema, fields, INCLUDE, EXCLUDE
data = {'LEVEL1.LEVEL2.LEVEL3': 'FooBar'}
class Level3Schema(Schema):
LEVEL3 = fields.String()
class Level2Schema(Schema):
LEVEL2 = fields.Nested(Level3Schema)
class Level1Schema(Schema):
LEVEL1 = fields.Nested(Level2Schema)
schema = Level1Schema()
print(schema.load(data, unknown=INCLUDE))
# prints: {'LEVEL1': {'LEVEL2': {'LEVEL3': 'FooBar'}}}
print(schema.load(data, unknown=EXCLUDE))
# prints: {}
print(schema.load(data))
# raises: marshmallow.exceptions.ValidationError: {'LEVEL1.LEVEL2.LEVEL3': ['Unknown field.']}

How to define the same field for load_only and dump_only params at the Marshmallow scheme?

I am trying to build a marshmallow scheme to both load and dump data. And I get everything OK except one field.
Problem description
(If you understand the problem, you don't have to read this).
For load data its type is Decimal. And I used it like this before. Now I want to use this schema for dumping and for that my flask API responses with: TypeError: Object of type Decimal is not JSON serializable. OK, I understand. I changed the type to Float. Then my legacy code started to get an exception while trying to save that field to database (it takes Decimal only). I don't want to change the legacy code so I looked for any solution at the marshmallow docs and found load_only and dump_only params. It seems like those are what I wanted, but here is my problem - I want to set them to the same field. So I just wondered if I can define both fields and tried this:
class PaymentSchema(Schema):
money = fields.Decimal(load_only=True)
money = fields.Float(dump_only=True)
I have been expected for a miracle, of course. Actually I was thinking that it will skip first definition (correctly, re-define it). What I got is an absence of the field at all.
Workaround solution
So I tried another solution. I created another schema for dump and inherit it from the former schema:
class PaymentSchema(Schema):
money = fields.Decimal(load_only=True)
class PaymentDumpSchema(PaymentSchema):
money = fields.Float(dump_only=True)
It works. But I wonder if there's some another, native, "marshmallow-way" solution for this. I have been looking through the docs but I can't find anything.
You can use the marshmallow decorator #pre_load in this decorator you can do whatever you want and return with your type
from marshmallow import pre_load
import like this and in this you will get your payload and change the type as per your requirement.
UPD: I found a good solution finally.
NEW SOLUTION
The trick is to define your field in load_fields and dump_fields inside __init__ method.
from marshmallow.fields import Integer, String, Raw
from marshmallow import Schema
class ItemDumpLoadSchema(Schema):
item = Raw()
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if not (self.only and 'item' not in self.only) and \
not (self.exclude and 'item' in self.exclude):
self.load_fields['item'] = Integer(missing=0)
self.dump_fields['item'] = String()
Usage:
>>> ItemDumpLoadSchema().load({})
{'item': 0}
>>> ItemDumpLoadSchema().dump({'item': 0})
{'item': '0'}
Don't forget to define field in a schema with some field (Raw in my example) - otherwise it may raise an exception in some cases (e.g. using of only and exclude keywords).
OLD SOLUTION
A little perverted one. It based on #prashant-suthar answer. I named load field with suffix _load and implemented #pre_load, #post_load and error handling.
class ArticleSchema(Schema):
id = fields.String()
title = fields.String()
text = fields.String()
class FlowSchema(Schema):
article = fields.Nested(ArticleSchema, dump_only=True)
article_load = fields.Int(load_only=True)
#pre_load
def pre_load(self, data, *args, **kwargs):
if data.get('article'):
data['article_load'] = data.pop('article')
return data
#post_load
def post_load(self, data, *args, **kwargs):
if data.get('article_load'):
data['article'] = data.pop('article_load')
return data
def handle_error(self, exc, data, **kwargs):
if 'article_load' in exc.messages:
exc.messages['article'] = exc.messages.pop('article_load')
raise exc
Why the old solution is not a good solution?
It doesn't allow to inheritate schemas with different handle_error methods defined. And you have to name pre_load and post_load methods with different names.
pass data_key argument to the field definition
Documentation mentions, data_key parameter can be used along with dump_only or load_only to be able to have same field with different functionality.
So you can write your schema as...
class PaymentSchema(Schema):
decimal_money = fields.Decimal(data_key="money", load_only=True)
money = fields.Float(dump_only=True)
This should solve your problem. I am using data_key for similar problem in marshmallow with SQLAlchemyAutoSchema and this fixed my issue.
Edit
Note: The key in ValidationError.messages (error messages) will be decimal_money by default. You may tweak the handle_error method of Schema class to replace decimal_money with money but it is not recommended as you yourself may not be able to differentiate between the error messages fields.
Thanks.

Overriding of CorpusView.read_block() not taken into account

I want to process a bunch of text files using NLTK, splitting them on a particular keyword. I am therefore trying to "subclass StreamBackedCorpusView, and override the read_block() method", as suggested by the documentation.
class CustomCorpusView(StreamBackedCorpusView):
def read_block(self, stream):
block = stream.readline().split()
print("wtf")
return [] # obviously this is only for debugging
class CustomCorpusReader(PlaintextCorpusReader):
CorpusView = CustomCorpusViewer
However my knowledge of inheritance is rusty, and it seems my overriding is not taken into account. The output of
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.words())
is identical to the output of
corpus = PlaintextCorpusReader("/path/to/files", ".*")
print(corpus.words())
I guess I'm missing something obvious, but what ?
The documentation actually suggests two ways of defining a custom corpus view :
Call the StreamBackedCorpusView constructor, and provide your block reader function via the block_reader argument.
Subclass StreamBackedCorpusView, and override the read_block() method.
It also suggests the first way is easier, and indeed I managed to get it working as the following :
from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.reader.api import *
class CustomCorpusReader(PlaintextCorpusReader):
def _custom_read_block(self, stream):
block = stream.readline().split()
print("wtf")
return [] # obviously this is only for debugging
def custom(self, fileids=None):
return concat(
[
self.CorpusView(fileid, self._custom_read_block, encoding=enc)
for (fileid, enc) in self.abspaths(fileids, True)
]
)
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.custom())

Can I use ModelSerializer (DRF) to move multiple fields to a JSON field in a CREATE method?

I'm building an API with the Django Rest Framework. The main requirement is that it should allow for the flexible inclusion of extra fields in the call. Based on a POST call, I would like to create a new record in Django, where some fields (varying in name and number) should be added to a JSON field (lead_request).
I doubt if I should use the ModelSerializer, as I don't know how to handle the various fields that should be merged into one field as a JSON. In the create method, I can't merge the additional fields into the JSON, as they aren't validated.
class Leads(models.Model):
campaign_id = models.ForeignKey(Campaigns, on_delete=models.DO_NOTHING)
lead_email = models.EmailField(null=True, blank=True)
lead_request = JSONField(default=dict, null=True, blank=True)
class LeadCreateSerializer(serializers.ModelSerializer):
def get_lead_request(self):
return {key: value for key, value in self.request.items() if key.startswith('rq_')}
class Meta:
model = Leads
fields = ['campaign_id',
'lead_email',
'lead_request']
def create(self, validated_data):
return Leads.objects.create(**validated_data)
The documentation mostly talks about assigning validated_data, but here that isn't possible.
If I understood correctly and you want to receive parameters through the URL as well, here's an example of how you could achieve what you want:
class LeadViewSet(viewsets.ModelViewSet):
def create(self, request, *args, **kwargs):
data = request.data
lead_request = generate_lead_request(request)
data['lead_request'] = lead_request
serializer = self.get_serializer(data=data)
serializer.is_valid(raise_exception=True)
...
And on generate_lead_request you could parse all the additional fields that may have been sent through request.data (body) as well as through the request.query_params.
If i understand the problem properly main obstruction here is we don't know the exact JSON data format of lead_request. I am thinking about two possible model of solution for this problem. I not sure either of them is appropriate or not. Just want to share my opinion.
case 1
Lets assume data passed to LeadCreateSerializer in this type of format
data = {
'campaign_id': campaign_id,
'lead_email': lead_email,
'lead_request': {
# lead_request
}
}
Then this is easy, normal model serializer should able to do that. If data is not in properly formatted and it possible to organize before passing to serializer that this should those view or functions responsibility to make it proper format.
case 2
Lets assume this is not possible to organize data before passing that in LeadCreateSerializer then we need to get our related value during the validation or get of lead_request. As this serializer responsibility is to create new instance and for that validate fields so we assume in self.context the whole self.context.request is present.
class LeadCreateSerializer(serializers.ModelSerializer):
def generate_lead_request(self, data):
# do your all possible validation and return
# in dict format
def get_lead_request(self):
request = self.context.request
lead_request = self.generate_lead_request(request.data)
return lead_request
class Meta:
model = Leads
fields = ['campaign_id',
'lead_email',
'lead_request']

Creating a list of Class objects from a file with no duplicates in attributes of the objects

I am currently taking some computer science courses in school and have come to a dead end and need a little help. Like the title says, I need of create a list of Class objects from a file with objects that have a duplicate not added to the list, I was able to successfully do this with a python set() but apparently that isn't allowed for this particular assignment, I have tried various other ways but can't seem to get it working without using a set. I believe the point of this assignment is comparing data structures in python and using the slowest method possible as it also has to be timed. my code using the set() will be provided.
import time
class Students:
def __init__(self, LName, FName, ssn, email, age):
self.LName = LName
self.FName = FName
self.ssn = ssn
self.email = email
self.age = age
def getssn(self):
return self.ssn
def main():
t1 = time.time()
f = open('InsertNames.txt', 'r')
studentlist = []
seen = set()
for line in f:
parsed = line.split(' ')
parsed = [i.strip() for i in parsed]
if parsed[2] not in seen:
studentlist.append(Students(parsed[0], parsed[1], parsed[2], parsed[3], parsed[4]))
seen.add(parsed[2])
else:
print(parsed[2], 'already in list, not added')
f.close()
print('final list length: ', len(studentlist))
t2 = time.time()
print('time = ', t2-t1)
main()
A note, that the only duplicates to be checked for are those of the .ssn attribute and the duplicate should not be added to the list. Is there a way to check what is already in the list by that specific attribute before adding it?
edit: Forgot to mention only 1 list allowed in memory.
You can write
if not any(s.ssn==parsed[2] for s in studentlist):
without committing to this comparison as the meaning of ==. At this level of work, you probably are expected to write out the loop and set a flag yourself rather than use a generator expression.
Since you already took the time to write a class representing a student and since ssn is a unique identifier for the instances, consider writing an __eq__ method for that class.
def __eq__(self, other):
return self.ssn == other.ssn
This will make your life easier when you want to compare two students, and in your case make a list (specifically not a set) of students.
Then your code would look something like:
with open('InsertNames.txt') as f:
for line in f:
student = Student(*line.strip().split())
if student not in student_list:
student_list.append(student)
Explanation
Opening a file with with statement makes your code more clean and
gives it the ability to handle errors and do cleanups correctly. And
since 'r' is a default for open it doesn't need to be there.
You should strip the line before splitting it just to handle some
edge cases but this is not obligatory.
split's default argument is ' ' so again it isn't necessary.
Just to clarify the meaning of this item is that the absence of a parameter make the split use whitespaces. It does not mean that a single space character is the default.
Creating the student before adding it to the list sounds like too
much overhead for this simple use but since there is only one
__init__ method called it is not that bad. The plus side of this
is that it makes the code more readable with the not in statement.
The in statement (and also not in of course) checks if the
object is in that list with the __eq__ method of that object.
Since you implemented that method it can check the in statement
for your Student class instances.
Only if the student doesn't exist in the list, it will be added.
One final thing, there is no creation of a list here other than the return value of split and the student_list you created.

Resources