Convert a Pandas's dataframe to a Django QuerySet - python-3.x

I have a model called Job, in this model I store both the original and duplicated jobs. In addition, I have another model called JobStatistics which stores all statistics of both original and duplicated jobs.
I want to group the statistics of the duplicated jobs and added them to the statistics of their original jobs. and return only the original jobs with all statistics.
suppose a job with the id of id=10001 that has 2 clicks. this job has 3 duplicates, each has 2,4,8 clicks respectively. the result should be
job_id clicks
10001 16
one approach is to convert the QuerySet to a Pandas's datafarme and perform and the grouping in the dataframe. but the problem with this approach is that i couldn't find a proper way to convert a dataframe back to a QuerySet. Why I cannot simply return the dataframe, well because I cannot paginate it with a serialzer.
What would be a proper way to handle a situation like this?
thank you.
# models.py
class Job(models.Model):
title = models.TextField(_("Job Title"))
original_job = models.ForeignKey('self',on_delete=models.CASCADE, null=True, blank=True)
...
class JobStatistics(models.Model):
job = models.ForeignKey("jobs.Job", verbose_name=_("Metric Job")
spend = models.FloatField(_("Spend"), default=0)
clicks = models.IntegerField(_("Job Clicks"), default=0)
impressions = models.IntegerField(_("Job Impressions"), default=0)
...
# views.py
statistics = JobStatistics.objects.filter.values('job').annotate(
title=F('job__title'),
reference_number=F('job__reference_number'),
clicks=Sum('clicks'),
original_job=F('job__original_job'),
impressions=Sum('impressions'),
cost=Round(Sum('spend'))
)
df = pd.DataFrame(statistics.values())
#perform the calculation...
....
options = [
'title', 'reference_number',
'clicks', 'impressions'
]
if order in options:
statistics = jobs.order_by(order)
else:
statistics = jobs.order_by('-title')
page = self.paginate_queryset(statistics)
if page is not None:
serializer = JobStatisticsSerializer(page, many=True)
return self.get_paginated_response(serializer.data)
serializer = JobStatisticsSerializer(statistics, many=True)
return Response(serializer.data

Related

Selecting random rows from table with sqlalchemy gives not random results

I have, on a pgsql backend, a table and an (allegedly) random access version of it:
class Topic(db.Model):
id = db.Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
question_id = db.Column(UUID(as_uuid=True), db.ForeignKey("question.id"))
topic = db.Column(Enum(QuestionTopicsChoices))
# Create an alias for Topic that uses BERNOULLI sampling method
Topic_rnd = db.aliased(Topic, tablesample(Topic, func.bernoulli(2.5), 10))
I'd expect, then, that these two queries return with different results:
topics_linear = Topic.query.filter_by(topic='emotion').all()
topics_random = Topic_rnd.query.filter_by(topic='emotion').all()
but they return the records in the same order. What am I doing wrong?

Django Subquery many values

class Category(models.Model):
name = models.CharField(max_length=100)
date = models.DateTimeField(auto_now=True)
class Hero(models.Model):
name = models.CharField(max_length=100)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
I want Categoty model name, data, id
In cookbook , I wrote the code as above.
hero_qs = Hero.objects.filter(
category=OuterRef("pk")
).order_by("-benevolence_factor")
Category.objects.all().annotate(
most_benevolent_hero=Subquery(
hero_qs.values('name')[:1]
)
)
It seems that only one value can be entered in hero_qs.values('name')
Is it possible to get name, data, id with one annotate?
You can try Concatenating the fields if you really want to use a single annotation
from django.db.models import Subquery, OuterRef, CharField, Value as V
from django.db.models.functions import Concat
hero_qs = Hero.objects.filter(
category=OuterRef("pk")
).order_by("-benevolence_factor").annotate(
details=Concat('name', V(','), 'id', output_field=CharField())
)
Category.objects.all().annotate(
most_benevolent_hero=Subquery(
hero_qs.values('details')[:1]
)
)
Then you can use string interpolation to separate that data out which is a relatively inexpensive operation
name, id = category.most_benevolent_hero.split(',')

Best way of using Django queryset where JSONField != {}

class JobAnalysis(Base, XYZ):
env_vars = JSONField(
default=dict
)
job = models.ForeignKey(
Job, related_name='jobanalyses'
)
seller = models.ForeignKey(
ABC,
null=True
)
class Usage(Base):
job = models.ForeignKey(
Job, null=True, blank=True
)
I want all usages where env_vars has some key pair.
usages_qs = Usage.objects.filter(
job__jobanalyses__seller__isnull=True
).exclude(
job__jobanalyses__env_vars__exact={}
)
I am using above queryset to fetch all usage information where seller is null and env_vars is not equals {}
usages_qs.query
SELECT "Usage"."job",
FROM "Usage"
LEFT OUTER JOIN "Job" ON ("Usage"."job_id" = "Job"."id")
LEFT OUTER JOIN "JobAnalysis" ON ("Job"."id" = "JobAnalysis"."job_id")
WHERE ("JobAnalysis"."seller_id" IS NULL
AND NOT ("Usage"."job_id" IN
(SELECT U2."job_id"
FROM "JobAnalysis" U2
WHERE U2."env_vars" = '{}')
AND "Usage"."job_id" IS NOT NULL))
But I am seeing performance issue here because .exclude(job__jobanalyses__env_vars__exact={}) create inner query and because of that this select statement is timing out.
Is there any better way of writing Django queryset for getting all usage record where seller is null and env_vars != {}?

Change SQLAlchemy __tablename__

I am using SQLAlchemy to handle requests from an API endpoint; my database tables (I have hundreds) are differentiated via a unique string (e.g. test_table_123)...
In the code below, __tablename__ is static. If possible, I would like that to change based on the specific table I would like to retrieve, as it would be tedious to write several hundred unique classes.
from config import db, ma # SQLAlchemy is init'd and tied to Flask in this config module
class specific_table(db.Model):
__tablename__ = 'test_table_123'
var1 = db.Column(db.Integer, primary_key=True)
var2 = db.Column(db.String, index=True)
var3 = db.Column(db.String)
class whole_table_schema(ma.ModelSchema):
class Meta:
model = specific_table
sqla_session = db.session
def single_table(table_name):
# collect the data from the unique table
my_data = specific_table().query.order_by(specific_table.level_0).all()
Thank you very much for your time in advance.
You can use reflect feature of SQLAlchemy.
engine = db.engine
metadata = MetaData()
metadata.reflect(bind=engine)
and finally
db.session.query(metadata.tables[table_name])
If you want smoother experience with querying, as previous solution cannot offer one, you might declare and map your tables: tables = {table_name: create_table(table_name) for table_name in table_names}, where create_table constructs models with different __tablename__. Instead of creating all tables at once, you can create them on demand.

Creating relationships with ORM Objects before inserting into a Database

Right now I have double the classes for the same data. The first are the Bill and the Expense class, used locally to exchange data within the program. I then I have Bill_Table and Expense_Table, used to exchange data between the program and database. This makes my program needlessly complicated, when I just want one of each.
Bill has a member variable that is a list of Expenses, like so:
class Bill:
vendor = None # type: str
expenses = None # type: list[Expense]
# plenty more variables here
def __init__(self, vendor=None,):
self.vendor = vendor
self.expenses = list()
class Expense:
account = None # type: str
amount = None # type: int
# etc...
My Bill_Table and Expense_Table are set up pretty much identical. I use some functions to convert a Bill into a Bill_table, or Expense into an Expense_Table, or visa versa.
from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Bill_Table(Base):
__tablename__ = 'bills'
id = Column(Integer, primary_key=True)
expenses = relationship("Expense_Table")
# etc...
class Expense_Table(Base):
__tablename__ = 'expenses'
id = Column(Integer, primary_key=True)
bill_id = Column(Integer, ForeignKey('bills.id'))
# etc...
How would I map some Expense_Table objects to a Bill_Table object, without connecting to a database? So I could have the same functionality, but also when I insert a Bill_Table into the database, it will also import it's Expense_Table objects with it too?

Resources