Conditional aggregation in SSAS - scope

I have a SCOPE statement in my cube that prevents aggregation of certain measures in incompatible dimension members are used:
SCOPE (MeasureGroupMeasures('Measure Group'), [User Type].[User Type].[All]);
this = IIF(DISTINCTCOUNT(NONEMPTY(EXISTING([User Type].[User Type].[All].Children)
,[Measures].[Measure Group Count])) > 1
, NULL
, [Measures].CurrentMember);
Basically if we are trying to aggregate data from the measure group for more than one "User Type", a null value is returned.
While this approach works fine, the performance leaves a lot to be desired.
Is there any way to achieve this that is much faster?
Thanks

Add a Measure Group using the same Data Source as your User Type dimension. The only Measure should be a count e.g User Type Count, not visible. The only Dimension Relationship should be to the User Type dimension.
Now you can just say:
IIF ( [Measures].[User Type Count] > 1 , ...

Related

django filter with related name [duplicate]

What is the difference between filter with multiple arguments and chain filter in django?
As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.
Example1 (same joined table): from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Blog.objects.filter(
entry__headline__contains='Lennon',
entry__pub_date__year=2008)
This will give you all the Blogs that have one entry with both (entry__headline__contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.
Result:
Blog with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}
Example 2 (chained)
Blog.objects.filter(
entry__headline__contains='Lennon'
).filter(
entry__pub_date__year=2008)
This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry__headline__contains='Lennon') and then from the result filters (entry__pub_date__year=2008).
The difference is that it will also give you results like:
A single Blog with multiple entries
{entry.headline: '**Lennon**', entry.pub_date: 2000},
{entry.headline: 'Bill', entry.pub_date: **2008**}
When the first filter was evaluated the book is included because of the first entry (even though it has other entries that don't match). When the second filter is evaluated the book is included because of the second entry.
One table: But if the query doesn't involve joined tables like the example from Yuji and DTing. The result is same.
The case in which results of "multiple arguments filter-query" is different than "chained-filter-query", following:
Selecting referenced objects on the basis of referencing objects and relationship is one-to-many (or many-to-many).
Multiple filters:
Referenced.filter(referencing1_a=x, referencing1_b=y)
# same referencing model ^^ ^^
Chained filters:
Referenced.filter(referencing1_a=x).filter(referencing1_b=y)
Both queries can output different result:
If more then one
rows in referencing-modelReferencing1can refer to same row in
referenced-modelReferenced. This can be the case in Referenced:
Referencing1 have either 1:N (one to many) or N:M (many to many)
relation-ship.
Example:
Consider my application my_company has two models Employee and Dependent. An employee in my_company can have more than dependents(in other-words a dependent can be son/daughter of a single employee, while a employee can have more than one son/daughter).
Ehh, assuming like husband-wife both can't work in a my_company. I took 1:m example
So, Employee is referenced-model that can be referenced by more then Dependent that is referencing-model. Now consider relation-state as follows:
Employee: Dependent:
+------+ +------+--------+-------------+--------------+
| name | | name | E-name | school_mark | college_mark |
+------+ +------+--------+-------------+--------------+
| A | | a1 | A | 79 | 81 |
| B | | b1 | B | 80 | 60 |
+------+ | b2 | B | 68 | 86 |
+------+--------+-------------+--------------+
Dependenta1refers to employeeA, and dependentb1, b2references to employeeB.
Now my query is:
Find all employees those having son/daughter has distinction marks (say >= 75%) in both college and school?
>>> Employee.objects.filter(dependent__school_mark__gte=75,
... dependent__college_mark__gte=75)
[<Employee: A>]
Output is 'A' dependent 'a1' has distinction marks in both college and school is dependent on employee 'A'. Note 'B' is not selected because nether of 'B''s child has distinction marks in both college and school. Relational algebra:
Employee ⋈(school_mark >=75 AND college_mark>=75)Dependent
In Second, case I need a query:
Find all employees whose some of dependents has distinction marks in college and school?
>>> Employee.objects.filter(
... dependent__school_mark__gte=75
... ).filter(
... dependent__college_mark__gte=75)
[<Employee: A>, <Employee: B>]
This time 'B' also selected because 'B' has two children (more than one!), one has distinction mark in school 'b1' and other is has distinction mark in college 'b2'.
Order of filter doesn't matter we can also write above query as:
>>> Employee.objects.filter(
... dependent__college_mark__gte=75
... ).filter(
... dependent__school_mark__gte=75)
[<Employee: A>, <Employee: B>]
result is same! Relational algebra can be:
(Employee ⋈(school_mark >=75)Dependent) ⋈(college_mark>=75)Dependent
Note following:
dq1 = Dependent.objects.filter(college_mark__gte=75, school_mark__gte=75)
dq2 = Dependent.objects.filter(college_mark__gte=75).filter(school_mark__gte=75)
Outputs same result: [<Dependent: a1>]
I check target SQL query generated by Django using print qd1.query and print qd2.query both are same(Django 1.6).
But semantically both are different to me. first looks like simple section σ[school_mark >= 75 AND college_mark >= 75](Dependent) and second like slow nested query: σ[school_mark >= 75](σ[college_mark >= 75](Dependent)).
If one need Code #codepad
btw, it is given in documentation #Spanning multi-valued relationships I have just added an example, I think it will be helpful for someone new.
Most of the time, there is only one possible set of results for a query.
The use for chaining filters comes when you are dealing with m2m:
Consider this:
# will return all Model with m2m field 1
Model.objects.filter(m2m_field=1)
# will return Model with both 1 AND 2
Model.objects.filter(m2m_field=1).filter(m2m_field=2)
# this will NOT work
Model.objects.filter(Q(m2m_field=1) & Q(m2m_field=2))
Other examples are welcome.
This answer is based on Django 3.1.
Environment
Models
class Blog(models.Model):
blog_id = models.CharField()
class Post(models.Model):
blog_id = models.ForeignKeyField(Blog)
title = models.CharField()
pub_year = models.CharField() # Don't use CharField for date in production =]
Database tables
Filters call
Blog.objects.filter(post__title="Title A", post__pub_year="2020")
# Result: <QuerySet [<Blog: 1>]>
Blog.objects.filter(post__title="Title A").filter(post_pub_date="2020")
# Result: <QuerySet [<Blog: 1>, [<Blog: 2>]>
Explanation
Before I start anything further, I have to notice that this answer is based on the situation that uses "ManyToManyField" or a reverse "ForeignKey" to filter objects.
If you are using the same table or an "OneToOneField" to filter objects, then there will be no difference between using a "Multiple Arguments Filter" or "Filter-chain". They both will work like a "AND" condition filter.
The straightforward way to understand how to use "Multiple Arguments Filter" and "Filter-chain" is to remember in a "ManyToManyField" or a reverse "ForeignKey" filter, "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition.
The reason that makes "Multiple Arguments Filter" and "Filter-chain" so different is that they fetch results from different join tables and use different conditions in the query statement.
"Multiple Arguments Filter" use "Post"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN ("Post" ON "Book"."id" = "Post"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "Post"."Public_Year" = '2020'
"Filter-chain" database query use "T1"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN "Post" ON ("Book"."id" = "Post"."book_id")
INNER JOIN "Post" T1 ON ("Book"."id" = "T1"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "T1"."Public_Year" = '2020'
But why do different conditions impact the result?
I believe most of us who come to this page, including me =], have the same assumption while using "Multiple Arguments Filter" and "Filter-chain" at first.
Which we believe the result should be fetched from a table like following one which is correct for "Multiple Arguments Filter". So if you are using the "Multiple Arguments Filter", you will get a result as your expectation.
But while dealing with the "Filter-chain", Django creates a different query statement which changes the above table to the following one. Also, the "Public Year" is identified under the "T1" section instead of the "Post" section because of the query statement change.
But where does this weird "Filter-chain" join table diagram come from?
I'm not a database expert. The explanation below is what I understand so far after I created the exact structure of the database and made a test with the same query statement.
The following diagram will show where this weird "Filter-chain" join table diagram comes from.
The database will first create a join table by matching the row of the "Blog" and "Post" tables one by one.
After that, the database now does the same matching process again but uses the step 1 result table to match the "T1" table which is just the same "Post" table.
And this is where this weird "Filter-chain" join table diagram comes from.
Conclusion
So two things make "Multiple Arguments Filter" and "Filter-chain" different.
Django create different query statements for "Multiple Arguments Filter" and "Filter-chain" which make "Multiple Arguments Filter" and "Filter-chain" result come from other tables.
"Filter-chain" query statement identifies a condition from a different place than "Multiple Arguments Filter".
The dirty way to remember how to use it is "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition while in a "ManyToManyField" or a reverse "ForeignKey" filter.
The performance difference is huge. Try it and see.
Model.objects.filter(condition_a).filter(condition_b).filter(condition_c)
is surprisingly slow compared to
Model.objects.filter(condition_a, condition_b, condition_c)
As mentioned in Effective Django ORM,
QuerySets maintain state in memory
Chaining triggers cloning, duplicating that state
Unfortunately, QuerySets maintain a lot of state
If possible, don’t chain more than one filter
You can use the connection module to see the raw sql queries to compare. As explained by Yuji's, for the most part they are equivalent as shown here:
>>> from django.db import connection
>>> samples1 = Unit.objects.filter(color="orange", volume=None)
>>> samples2 = Unit.objects.filter(color="orange").filter(volume=None)
>>> list(samples1)
[]
>>> list(samples2)
[]
>>> for q in connection.queries:
... print q['sql']
...
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
>>>
If you end up on this page looking for how to dynamically build up a django queryset with multiple chaining filters, but you need the filters to be of the AND type instead of OR, consider using Q objects.
An example:
# First filter by type.
filters = None
if param in CARS:
objects = app.models.Car.objects
filters = Q(tire=param)
elif param in PLANES:
objects = app.models.Plane.objects
filters = Q(wing=param)
# Now filter by location.
if location == 'France':
filters = filters & Q(quay=location)
elif location == 'England':
filters = filters & Q(harbor=location)
# Finally, generate the actual queryset
queryset = objects.filter(filters)
If requires a and b then
and_query_set = Model.objects.filter(a=a, b=b)
if requires a as well as b then
chaied_query_set = Model.objects.filter(a=a).filter(b=b)
Official Documents:
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Related Post: Chaining multiple filter() in Django, is this a bug?
There is a difference when you have request to your related object,
for example
class Book(models.Model):
author = models.ForeignKey(Author)
name = models.ForeignKey(Region)
class Author(models.Model):
name = models.ForeignKey(Region)
request
Author.objects.filter(book_name='name1',book_name='name2')
returns empty set
and request
Author.objects.filter(book_name='name1').filter(book_name='name2')
returns authors that have books with both 'name1' and 'name2'
for details look at
https://docs.djangoproject.com/en/dev/topics/db/queries/#s-spanning-multi-valued-relationships

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

How to use Django iterator with value list?

I have Profile table with a huge number of rows. I was trying to filter out profiles based on super_category and account_id (these are the fields in the model Profile).
Assume I have a list of ids in the form of bulk_account_ids and super_categories
list_of_ids = Profile.objects.filter(account_id__in=bulk_account_ids, super_category__in=super_categories).values_list('id', flat=True))
list_of_ids = list(list_of_ids)
SomeTask.delay(ids=list_of_ids)
This particular query is timing out while it gets evaluated in the second line.
Can I use .iterator() at the end of the query to optimize this?
i.e list(list_of_ids.iterator()), if not what else I can do?

loopback relational database hasManyThrough pivot table

I seem to be stuck on a classic ORM issue and don't know really how to handle it, so at this point any help is welcome.
Is there a way to get the pivot table on a hasManyThrough query? Better yet, apply some filter or sort to it. A typical example
Table products
id,title
Table categories
id,title
table products_categories
productsId, categoriesId, orderBy, main
So, in the above scenario, say you want to get all categories of product X that are (main = true) or you want to sort the the product categories by orderBy.
What happens now is a first SELECT on products to get the product data, a second SELECT on products_categories to get the categoriesId and a final SELECT on categories to get the actual categories. Ideally, filters and sort should be applied to the 2nd SELECT like
SELECT `id`,`productsId`,`categoriesId`,`orderBy`,`main` FROM `products_categories` WHERE `productsId` IN (180) WHERE main = 1 ORDER BY `orderBy` DESC
Another typical example would be wanting to order the product images based on the order the user wants them to
so you would have a products_images table
id,image,productsID,orderBy
and you would want to
SELECT from products_images WHERE productsId In (180) ORDER BY orderBy ASC
Is that even possible?
EDIT : Here is the relationship needed for an intermediate table to get what I need based on my schema.
Products.hasMany(Images,
{
as: "Images",
"foreignKey": "productsId",
"through": ProductsImagesItems,
scope: function (inst, filter) {
return {active: 1};
}
});
Thing is the scope function is giving me access to the final result and not to the intermediate table.
I am not sure to fully understand your problem(s), but for sure you need to move away from the table concept and express your problem in terms of Models and Relations.
The way I see it, you have two models Product(properties: title) and Category (properties: main).
Then, you can have relations between the two, potentially
Product belongsTo Category
Category hasMany Product
This means a product will belong to a single category, while a category may contain many products. There are other relations available
Then, using the generated REST API, you can filter GET requests to get items in function of their properties (like main in your case), or use custom GET requests (automatically generated when you add relations) to get for instance all products belonging to a specific category.
Does this helps ?
Based on what you have here I'd probably recommend using the scope option when defining the relationship. The LoopBack docs show a very similar example of the "product - category" scenario:
Product.hasMany(Category, {
as: 'categories',
scope: function(instance, filter) {
return { type: instance.type };
}
});
In the example above, instance is a category that is being matched, and each product would have a new categories property that would contain the matching Category entities for that Product. Note that this does not follow your exact data scheme, so you may need to play around with it. Also, I think your API query would have to specify that you want the categories related data loaded (those are not included by default):
/api/Products/13?filter{"include":["categories"]}
I suggest you define a custom / remote method in Product.js that does the work for you.
Product.getCategories(_productId){
// if you are taking product title as param instead of _productId,
// you will first need to find product ID
// then execute a find query on products_categories with
// 1. where filter to get only main categoris and productId = _productId
// 2. include filter to include product and category objects
// 3. orderBy filter to sort items based on orderBy column
// now you will get an array of products_categories.
// Each item / object in the array will have nested objects of Product and Category.
}

SSAS How to scope a calculated member measure to a number of specific members?

I'm trying to create a calculated member measure for a subset of a group of locations. All other members should be null. I can limit the scope but then in the client (excel in this case) the measure does not present the grand total ([Group].[Group].[All]).
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated Measure]
AS (
Null
),
FORMAT_STRING = "$#,##0.00;-$#,##0.00",
NON_EMPTY_BEHAVIOR = { [Measures].[Places] }
,VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Locations';
-----------------------------------------------------------------------------------------
SCOPE ({([Group].[Group].&[location 1]),
([Group].[Group].&[location 2]),
([Group].[Group].&[location 3]),
([Group].[Group].&[location 4]),
([Group].[Group].&[location 5])
}, [Measures].[Calculated Measure]);
// Location Calculations
THIS = (
[Measures].[Adjusted Dollars] - [Measures].[Adjusted Dollars by Component] + [Measures].[Adjusted OS Dollars]
);
END SCOPE;
It's as though the [Group].[Group].[All] member is outside of the scope so it won't aggregate the rest of the members. Any help would be appreciated.
Your calculation is applied after all calculations already happened. You can get around this by adding Root([Time]) to the scope, assuming your time dimension is named [Time]. And if you want to aggregate across more dimensions, you would have to add them all to the SCOPE.
In most cases when you have a calculation that you want too do before aggregation it is more easy to define the calculation e. g. in the DSV, e. g. with an expression like
CASE WHEN group_location in(1, 2, 3, 4) THEN
Adjusted_dollars - adjusted_dollars_by_comp + adjusted_os_dollars
ELSE NULL
END
and just make a standard aggregatable measure from it.
I've searched high and low for this answer. After reading the above suggestion, I came up with this:
Calculate the measure in a column in the source view table
(isnull(a,0) - isnull(b,0)) + isnull(c,0) = x
Add the unfiltered calculated column (x) to the dsv
Create a named calculation in the dsv that uses a case statement to filter the original calc measure CASE WHEN location IN ( 1,2,3)THEN xELSE NULLEND
Add the named calculation as measure
I choose to do it this way to capture the measure unfiltered first then, if another filter needs to be added or one needs to be taken off, I can do so without messing with the views again. I just add the new member to filter by to my named calculation case statement. I tried to insert the calculation directly into a named calculation in the dsv that filtered it as well but the calculation produced the incorrect results.

Resources