I have the following schema:
For the Boss class, I need the name of all the agents who made the sales with the highest value (something like: foreach agent, select agent name if he has
max(foreach command, total = total + price of the product * quantity from command).
How do I do this in OCL?
If you consider that you are at the root of the model (no context in particular) in order to select the 20 first top Agents:
Agent.allInstances()->sortedBy(- sale->collect(quantity*product.price)->sum())->subSequence(1, 20)
and from a Boss instance:
self.workers->sortedBy(- sale->collect(quantity*product.price)->sum())->subSequence(1, 20)
The idea behind the request is (for the 1st one):
get all the agents (Agent.allInstances())
sort them (...->sortedBy(...))
using the sum of their sales (... sale->...->sum())
a "sale" is defined by the mult. of the quantity by the price of the referenced product (quantity*product.price)
for each sale, compute this stuff (...sale->collect(...))
from this final result (the sum one), inverse the result to have the top in first positions (... - sale->collect()->sum()...)
from this final list, select a sub sequence (...->subSequence(1,X))
EDIT>
Just a detail about association class navigation (from the "OCL Specification", p.21)
To specify navigation to association classes (Job and Marriage in the example), OCL uses a dot and the name of the
association class
Following the early version of the specification, the Association Class name is said to be put as lower case, in the later version, the name is let untouched.
EDIT2>
In order to get the higher score and the agents name who hits this highest score:
let score : Integer = -(self.workers->collect(sale->collect(quantity*product.price)->sum())->sortedBy(i | -i)->first())
in self.workers->select(sale->collect(quantity*product.price)->sum() = score).name
The first let select the higher score (collect all the scores, sort them in reverse order and select the first element), then select all the workers who have a score equals to the previously computed one.
Related
What is the difference between filter with multiple arguments and chain filter in django?
As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.
Example1 (same joined table): from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Blog.objects.filter(
entry__headline__contains='Lennon',
entry__pub_date__year=2008)
This will give you all the Blogs that have one entry with both (entry__headline__contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.
Result:
Blog with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}
Example 2 (chained)
Blog.objects.filter(
entry__headline__contains='Lennon'
).filter(
entry__pub_date__year=2008)
This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry__headline__contains='Lennon') and then from the result filters (entry__pub_date__year=2008).
The difference is that it will also give you results like:
A single Blog with multiple entries
{entry.headline: '**Lennon**', entry.pub_date: 2000},
{entry.headline: 'Bill', entry.pub_date: **2008**}
When the first filter was evaluated the book is included because of the first entry (even though it has other entries that don't match). When the second filter is evaluated the book is included because of the second entry.
One table: But if the query doesn't involve joined tables like the example from Yuji and DTing. The result is same.
The case in which results of "multiple arguments filter-query" is different than "chained-filter-query", following:
Selecting referenced objects on the basis of referencing objects and relationship is one-to-many (or many-to-many).
Multiple filters:
Referenced.filter(referencing1_a=x, referencing1_b=y)
# same referencing model ^^ ^^
Chained filters:
Referenced.filter(referencing1_a=x).filter(referencing1_b=y)
Both queries can output different result:
If more then one
rows in referencing-modelReferencing1can refer to same row in
referenced-modelReferenced. This can be the case in Referenced:
Referencing1 have either 1:N (one to many) or N:M (many to many)
relation-ship.
Example:
Consider my application my_company has two models Employee and Dependent. An employee in my_company can have more than dependents(in other-words a dependent can be son/daughter of a single employee, while a employee can have more than one son/daughter).
Ehh, assuming like husband-wife both can't work in a my_company. I took 1:m example
So, Employee is referenced-model that can be referenced by more then Dependent that is referencing-model. Now consider relation-state as follows:
Employee: Dependent:
+------+ +------+--------+-------------+--------------+
| name | | name | E-name | school_mark | college_mark |
+------+ +------+--------+-------------+--------------+
| A | | a1 | A | 79 | 81 |
| B | | b1 | B | 80 | 60 |
+------+ | b2 | B | 68 | 86 |
+------+--------+-------------+--------------+
Dependenta1refers to employeeA, and dependentb1, b2references to employeeB.
Now my query is:
Find all employees those having son/daughter has distinction marks (say >= 75%) in both college and school?
>>> Employee.objects.filter(dependent__school_mark__gte=75,
... dependent__college_mark__gte=75)
[<Employee: A>]
Output is 'A' dependent 'a1' has distinction marks in both college and school is dependent on employee 'A'. Note 'B' is not selected because nether of 'B''s child has distinction marks in both college and school. Relational algebra:
Employee ⋈(school_mark >=75 AND college_mark>=75)Dependent
In Second, case I need a query:
Find all employees whose some of dependents has distinction marks in college and school?
>>> Employee.objects.filter(
... dependent__school_mark__gte=75
... ).filter(
... dependent__college_mark__gte=75)
[<Employee: A>, <Employee: B>]
This time 'B' also selected because 'B' has two children (more than one!), one has distinction mark in school 'b1' and other is has distinction mark in college 'b2'.
Order of filter doesn't matter we can also write above query as:
>>> Employee.objects.filter(
... dependent__college_mark__gte=75
... ).filter(
... dependent__school_mark__gte=75)
[<Employee: A>, <Employee: B>]
result is same! Relational algebra can be:
(Employee ⋈(school_mark >=75)Dependent) ⋈(college_mark>=75)Dependent
Note following:
dq1 = Dependent.objects.filter(college_mark__gte=75, school_mark__gte=75)
dq2 = Dependent.objects.filter(college_mark__gte=75).filter(school_mark__gte=75)
Outputs same result: [<Dependent: a1>]
I check target SQL query generated by Django using print qd1.query and print qd2.query both are same(Django 1.6).
But semantically both are different to me. first looks like simple section σ[school_mark >= 75 AND college_mark >= 75](Dependent) and second like slow nested query: σ[school_mark >= 75](σ[college_mark >= 75](Dependent)).
If one need Code #codepad
btw, it is given in documentation #Spanning multi-valued relationships I have just added an example, I think it will be helpful for someone new.
Most of the time, there is only one possible set of results for a query.
The use for chaining filters comes when you are dealing with m2m:
Consider this:
# will return all Model with m2m field 1
Model.objects.filter(m2m_field=1)
# will return Model with both 1 AND 2
Model.objects.filter(m2m_field=1).filter(m2m_field=2)
# this will NOT work
Model.objects.filter(Q(m2m_field=1) & Q(m2m_field=2))
Other examples are welcome.
This answer is based on Django 3.1.
Environment
Models
class Blog(models.Model):
blog_id = models.CharField()
class Post(models.Model):
blog_id = models.ForeignKeyField(Blog)
title = models.CharField()
pub_year = models.CharField() # Don't use CharField for date in production =]
Database tables
Filters call
Blog.objects.filter(post__title="Title A", post__pub_year="2020")
# Result: <QuerySet [<Blog: 1>]>
Blog.objects.filter(post__title="Title A").filter(post_pub_date="2020")
# Result: <QuerySet [<Blog: 1>, [<Blog: 2>]>
Explanation
Before I start anything further, I have to notice that this answer is based on the situation that uses "ManyToManyField" or a reverse "ForeignKey" to filter objects.
If you are using the same table or an "OneToOneField" to filter objects, then there will be no difference between using a "Multiple Arguments Filter" or "Filter-chain". They both will work like a "AND" condition filter.
The straightforward way to understand how to use "Multiple Arguments Filter" and "Filter-chain" is to remember in a "ManyToManyField" or a reverse "ForeignKey" filter, "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition.
The reason that makes "Multiple Arguments Filter" and "Filter-chain" so different is that they fetch results from different join tables and use different conditions in the query statement.
"Multiple Arguments Filter" use "Post"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN ("Post" ON "Book"."id" = "Post"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "Post"."Public_Year" = '2020'
"Filter-chain" database query use "T1"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN "Post" ON ("Book"."id" = "Post"."book_id")
INNER JOIN "Post" T1 ON ("Book"."id" = "T1"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "T1"."Public_Year" = '2020'
But why do different conditions impact the result?
I believe most of us who come to this page, including me =], have the same assumption while using "Multiple Arguments Filter" and "Filter-chain" at first.
Which we believe the result should be fetched from a table like following one which is correct for "Multiple Arguments Filter". So if you are using the "Multiple Arguments Filter", you will get a result as your expectation.
But while dealing with the "Filter-chain", Django creates a different query statement which changes the above table to the following one. Also, the "Public Year" is identified under the "T1" section instead of the "Post" section because of the query statement change.
But where does this weird "Filter-chain" join table diagram come from?
I'm not a database expert. The explanation below is what I understand so far after I created the exact structure of the database and made a test with the same query statement.
The following diagram will show where this weird "Filter-chain" join table diagram comes from.
The database will first create a join table by matching the row of the "Blog" and "Post" tables one by one.
After that, the database now does the same matching process again but uses the step 1 result table to match the "T1" table which is just the same "Post" table.
And this is where this weird "Filter-chain" join table diagram comes from.
Conclusion
So two things make "Multiple Arguments Filter" and "Filter-chain" different.
Django create different query statements for "Multiple Arguments Filter" and "Filter-chain" which make "Multiple Arguments Filter" and "Filter-chain" result come from other tables.
"Filter-chain" query statement identifies a condition from a different place than "Multiple Arguments Filter".
The dirty way to remember how to use it is "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition while in a "ManyToManyField" or a reverse "ForeignKey" filter.
The performance difference is huge. Try it and see.
Model.objects.filter(condition_a).filter(condition_b).filter(condition_c)
is surprisingly slow compared to
Model.objects.filter(condition_a, condition_b, condition_c)
As mentioned in Effective Django ORM,
QuerySets maintain state in memory
Chaining triggers cloning, duplicating that state
Unfortunately, QuerySets maintain a lot of state
If possible, don’t chain more than one filter
You can use the connection module to see the raw sql queries to compare. As explained by Yuji's, for the most part they are equivalent as shown here:
>>> from django.db import connection
>>> samples1 = Unit.objects.filter(color="orange", volume=None)
>>> samples2 = Unit.objects.filter(color="orange").filter(volume=None)
>>> list(samples1)
[]
>>> list(samples2)
[]
>>> for q in connection.queries:
... print q['sql']
...
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
>>>
If you end up on this page looking for how to dynamically build up a django queryset with multiple chaining filters, but you need the filters to be of the AND type instead of OR, consider using Q objects.
An example:
# First filter by type.
filters = None
if param in CARS:
objects = app.models.Car.objects
filters = Q(tire=param)
elif param in PLANES:
objects = app.models.Plane.objects
filters = Q(wing=param)
# Now filter by location.
if location == 'France':
filters = filters & Q(quay=location)
elif location == 'England':
filters = filters & Q(harbor=location)
# Finally, generate the actual queryset
queryset = objects.filter(filters)
If requires a and b then
and_query_set = Model.objects.filter(a=a, b=b)
if requires a as well as b then
chaied_query_set = Model.objects.filter(a=a).filter(b=b)
Official Documents:
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Related Post: Chaining multiple filter() in Django, is this a bug?
There is a difference when you have request to your related object,
for example
class Book(models.Model):
author = models.ForeignKey(Author)
name = models.ForeignKey(Region)
class Author(models.Model):
name = models.ForeignKey(Region)
request
Author.objects.filter(book_name='name1',book_name='name2')
returns empty set
and request
Author.objects.filter(book_name='name1').filter(book_name='name2')
returns authors that have books with both 'name1' and 'name2'
for details look at
https://docs.djangoproject.com/en/dev/topics/db/queries/#s-spanning-multi-valued-relationships
I have query that returns the following queryset:
results = <QuerySet [<Product: ItemA>, <Product: ItemA>, <Product: ItemB>, <Product: ItemB>, <Product: ItemB>, <Product: ItemC>, <Product: ItemC>]>
The __str__ representation of the model is name and each Product variation likely has a different value for the price field. After this query, I need to search my database for each Product in the queryset and return the lowest price for each unique name so like:
Lowest price for all in database where name is == to ItemA
Lowest price for all in database where name is == to ItemB
Lowest price for all in database where name is == to ItemC
I use the following block of code to accomplish this goal:
query_list = []
for each in results:
if each.name not in query_list: #Checks if the name of the object is not in in the query list
query_list.append(each.name) #Adds just the name of the objects so there is just one of each name in query_list
for each in query_list:
priced = results.filter(name=each).order_by('price').first() #Lowest price for each name in query_list
This feel very inefficient. Is there a way to make a similar computation without having to append the unique name of each Product to a separate list, and iterating over that list, and then making a query for each one? I feel like there is a way to use a type of complex lookup to accomplish my goals, maybe event use less Python, and make the db do more of the work, but the above is the best I've been able to figure out so far. There can be a lot of different hits in results so I need this block to be as efficient as possible
It is easy after reading docs Generating aggregates for each item in a QuerySet and also "Interaction with default ordering or order_by()".
from django.db.models import Min
prices = {x['name']: x['lowest_price']
for x in results.values('name').annotate(lowest_price=Min('price').order_by()}
for product in results:
if product.name in prices and product.price == prices[product.name]:
priced = row # output the row
del prices[product.name]
That runs by two database queries.
An even more efficient solution with one query is probably possible with Window function, but it requires an advanced database backend and it can't work e.g. in tests with sqlite3.
I have a dataset that looks like:
Date CONSUMER DISCR CONSUMER STAPLES ENERGY FINANCIALS HEALTH CARE INDUSTRIALS INFORMATION TECH MATERIALS REAL ESTATE TELECOM SVC UTILITIES
2/28/2006 0.16630621 0.045185409 0.044640056 0.123505969 0.053980333 0.088535648 0.234666154 0.119729025 0.034316211 0.067272708 0.021862279
3/31/2006 0.13323423 0.0135331245 0.022255232 0.124240924 0.054290724 0.088825904 0.055432 0.118432505 0.03418562 0.066877285 0.33847323
Each of the numbers for the sectors indicates the importance of the industry to the stock market. I am not interested in all industries but the top n most important ones. (the higher the number, the more important the industry is).
I want a method in Excel that dynamically visualizes the top n values for each date. For example, for 2/28/2006, for n = 4, it should visualizeINFORMATION TECH, CONSUMER DISCR, FINANCIALS, and MATERIALS.
For 3/31/2006, for n = 4, it should visualizeUTILITIES, CONSUMER DISCR, FINANCIALS, and MATERIALS
What method exists in Excel?
Per the supplied image,
=INDEX($1:$1, , MATCH(LARGE($B2:$L2, COLUMN(A:A)), $A2:$L2, 0))
use something like this:
=IF(ROW(1:1) >$O$1,"",INDEX($A$1:$L$1,AGGREGATE(15,6,COLUMN($B$1:$L$1)/(INDEX($B$2:$L$3,MATCH($O$2,$A$2:$A$3,0),0)=LARGE(INDEX($B$2:$L$3,MATCH($O$2,$A$2:$A$3,0),0),ROW(1:1))),1)))
You would put this in the first cell and copy down far enough to satisfy the largest n possible.
I have a treectrl structure which is populated from an external search of an open data set hosted by our municipal government. The data pertains to business licenses and is requested using Pandas and Sodapy. The tree is populated as follows:
for index, row in results_df.iterrows():
tradename = row['tradename']
address = row['address']
licTypes = row['licencetypes']
comm = row['comdistnm']
jobSts = row['jobstatusdesc']
jobCrt = row['jobcreated']
lng = row['longitude']
lng = str(lng)
lat = row['latitude']
lat = str(lat)
# Populate Tree Controls with DataFrame values
trdName = self.thrTree.AppendItem(root, tradename)
self.thrTree.AppendItem(trdName, address)
self.thrTree.AppendItem(trdName, licTypes)
self.thrTree.AppendItem(trdName, comm)
self.thrTree.AppendItem(trdName, jobSts)
self.thrTree.AppendItem(trdName, jobCrt)
self.thrTree.AppendItem(trdName, lng)
self.thrTree.AppendItem(trdName, lat)
This will result in a final structure of root, then node 1 with business name, and when expanded, contains all the information listed above, so I'm assuming root level, then child node 1, then child.child of node 1? Not even sure how the second second indented nodes are called. (I've heard the term leaf for the third level used before) But I digress; what I am interested in is grabbing the Latitude and Longitude of where the business is located, then allowing the user to map the location if they choose. I bind a wx.EVT_TREE_ITEM_ACTIVATED so that when the user double clicks on a business name to get the details, I want to grab the items displayed. This is how I am currently trying to iterate through the child nodes.
item = self.thrTree.GetSelection()
while self.thrTree.GetItemParent(item):
piece = self.thrTree.GetItemText(item)
tmpHldr.insert(0, piece)
item = self.thrTree.GetItemParent(item)
Looking at item, it appears to be collecting all the business names under root, and ignoring the third level items of interest.
What do I need to do to go deeper within the tree to grab the details under the business clicked on, and not just the list of business names under the root item, which is called 'Search Results'?
Thanks!
#YYC_Code,
Did you look here?
This has GetFirstChild()/GetNextChild() pair functions that you can use to iterate. It also has ItemHasChildren() function which you can use to verify if the item has any children and use the pair mentioned above if it does.
EDIT:
[quote]
For this enumeration function you must pass in a ‘cookie’ parameter which is opaque for the application but is necessary for the library to make these functions reentrant (i.e. allow more than one enumeration on one and the same object simultaneously). The cookie passed to GetFirstChild and GetNextChild should be the same variable.
[/quote]
You need to make sure that the cookie parameter is the same during the iteration.
You should also do this:
[quote]
Returns an invalid tree item (i.e. wx.TreeItemId.IsOk returns False) if there are no further children.
[/quote]
I'm trying to create a calculated member measure for a subset of a group of locations. All other members should be null. I can limit the scope but then in the client (excel in this case) the measure does not present the grand total ([Group].[Group].[All]).
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated Measure]
AS (
Null
),
FORMAT_STRING = "$#,##0.00;-$#,##0.00",
NON_EMPTY_BEHAVIOR = { [Measures].[Places] }
,VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Locations';
-----------------------------------------------------------------------------------------
SCOPE ({([Group].[Group].&[location 1]),
([Group].[Group].&[location 2]),
([Group].[Group].&[location 3]),
([Group].[Group].&[location 4]),
([Group].[Group].&[location 5])
}, [Measures].[Calculated Measure]);
// Location Calculations
THIS = (
[Measures].[Adjusted Dollars] - [Measures].[Adjusted Dollars by Component] + [Measures].[Adjusted OS Dollars]
);
END SCOPE;
It's as though the [Group].[Group].[All] member is outside of the scope so it won't aggregate the rest of the members. Any help would be appreciated.
Your calculation is applied after all calculations already happened. You can get around this by adding Root([Time]) to the scope, assuming your time dimension is named [Time]. And if you want to aggregate across more dimensions, you would have to add them all to the SCOPE.
In most cases when you have a calculation that you want too do before aggregation it is more easy to define the calculation e. g. in the DSV, e. g. with an expression like
CASE WHEN group_location in(1, 2, 3, 4) THEN
Adjusted_dollars - adjusted_dollars_by_comp + adjusted_os_dollars
ELSE NULL
END
and just make a standard aggregatable measure from it.
I've searched high and low for this answer. After reading the above suggestion, I came up with this:
Calculate the measure in a column in the source view table
(isnull(a,0) - isnull(b,0)) + isnull(c,0) = x
Add the unfiltered calculated column (x) to the dsv
Create a named calculation in the dsv that uses a case statement to filter the original calc measure CASE WHEN location IN ( 1,2,3)THEN xELSE NULLEND
Add the named calculation as measure
I choose to do it this way to capture the measure unfiltered first then, if another filter needs to be added or one needs to be taken off, I can do so without messing with the views again. I just add the new member to filter by to my named calculation case statement. I tried to insert the calculation directly into a named calculation in the dsv that filtered it as well but the calculation produced the incorrect results.