I have a PivotTable that comes from the following table:
+---------+---+-----+
| A | B | C |
+-+---------+---+-----+
|1| Date |Id |Value|
+-+---------+---+-----+
|2|4/01/2013|1 |4 |
+-+---------+---+-----+
|3|4/01/2013|2 |5 |
+-+---------+---+-----+
|4|4/01/2013|1 |20 |
+-+---------+---+-----+
|5|4/02/2013|2 |20 |
+-+---------+---+-----+
|6|4/02/2013|1 |15 |
+-+---------+---+-----+
And I want to aggregate first by Id and then by Date, using Max to aggregate by Id and then Sum to aggregate by Date. The resulting table would look like this:
+---------+----------------+
| A | B |
+-+---------+----------------+
|1| Date |Sum(Max(Id,Date)|
+-+---------+----------------+
|2|4/01/2013|25 |
+-+---------+----------------+
|3|4/02/2013|35 |
+-+---------+----------------+
The 25 above comes from getting the Max per Id per Date (Max(1, 4/01/2013) -> 20 and Max(2, 4/01/2013) -> 5, so the Sum of those Max is 25.
I can do the two levels of aggregation easily by adding the Date and Id columns into the Rows section of the PivotTable, but when choosing an aggregation function for Value, I can either choose Max, getting a Max of Max, or Sum, getting a Sum of Sum. That is, I cannot get a Sum of Max.
Do you know how to achieve this? Ideally, the solution would not be to compute a PivotTable and then copy from there or get a formula, because that would break easily if I want to dynamically change fields.
Thanks!
This is how I would do it in SQL:
SELECT DATE, SUM(MAXED_VAL) as SummedMaxedVal
FROM (
SELECT DATE, ID, MAX(VALUE) as MAXED_VAL
FROM table
GROUP BY DATE, ID
)
GROUP BY DATE
Related
I have two tables each comprised of 1 column. Table One [DATES] is a list of ordered months. Table Two [DEPARTMENTS] is a unique list of departments.
I want to combine the two tables, repeating the list of departments for every month in the [DATE] Table
Exmaple:
Table 1 [DATES]:
|MONTH |
|1/31/2022 |
|2/28/2022 |
|3/31/2022 |
Table 2 [Departments]
|DEPARTMENT|
|A |
|B |
|C |
How I want it to look:
|MONTH |DEPARTMENT|
|1/31/2022 |A |
|1/31/2022 |B |
|1/31/2022 |C |
|2/28/2022 |A |
|2/28/2022 |B |
|2/28/2022 |C |
|3/31/2022 |A |
|3/31/2022 |B |
|3/31/2022 |C |
I am not sure this is possible as joining requires at the least equivalent type columns to even join on.
Is this Doable?
In Table1, add column .. custom column .. with formula
=Table2
Then expand rows
I'm trying to figure out how to get the Max date from a column DateCol, that is less than a variable run_date.
For example from this input table below, I want to compare the DateCol with a variable runtime. Suppose run_time= '2022-03-05', I'd like to select the third row, as that's where the Max value is, and DateCol<='2022-03-05' How can this be done? Many thanks.
+---+---+------+
| ID| DateCol |
+---+---+-------
|1. |'2022-03-01'|
|2 |'2022-03-03'|
|3. |'2022-03-04'|
|4. |'2022-03-06'|
+---+------------+
Just do a filter based on run_time then sort by date and get the max record.
run_time= '2022-03-05'
(df
.where(F.col('DateCol') <= run_time)
.orderBy(F.desc('DateCol'))
.limit(1)
.show()
)
+---+----------+
| ID| DateCol|
+---+----------+
| 3|2022-03-04|
+---+----------+
I want to partition by three columns in my query :
user id
cancelation month year.
retention month year.
I used row number and partition by as follows
row_number() over (partition by user_id ,cast ( date_format(cancelation_date,'yyyyMM') as integer),cast ( date_format(retention_date,'yyyyMM') as integer) order by cast ( date_format(cancelation_date,'yyyyMM') as integer) asc, cast ( date_format(retention_date,'yyyyMM') as integer) asc) as row_count
example of the output I got :
| user_id |cancelation_date |cancelation_month_year|retention_date|retention_month_year|row_count|
| -------- | -------------- |----------------------|--------------|--------------------|---------|
| 566 | 28-5-2020 | 202005 | 20-7-2020 | 202007 |1 |
| 566 | 28-5-2020 | 202005 | 30-7-2-2020 | 202007 |2 |
example of the output I want to get:
user_id
cancelation_date
cancelation_month_year
retention_date
retention_month_year
row_count
566
28-5-2020
202005
20-7-2020
202007
1
566
28-5-2020
202005
30-7-2-2020
202007
1
note that user may have more than cancelation months, for example f he has canceled in August , I want row count =2 for all dates in August and so on.
it's not obvious why partition by is partitioning by retention date instead of partitioning by retention month year.
I get the impression that row_number is not what you want, rather you are interested in dense_rank, wherein you would get your expected output.
I have Sale Invoices for bread, jam, etc. like this table.
+-------+--------+-------+
| Item | Date | Price |
+-------+--------+-------+
| Bread | 1-Dec | 5 |
+-------+--------+-------+
| Jam | 1-Dec | 5 |
+-------+--------+-------+
| Bread | 8-Dec | 6 |
+-------+--------+-------+
| Jam | 8-Dec | 4 |
+-------+--------+-------+
| Bread | 15-Dec | 4 |
+-------+--------+-------+
| Jam | 15-Dec | 7 |
+-------+--------+-------+
I want the highest price date for each item like
+-------+--------+---------------+
| Item | Date | Highest Price |
+-------+--------+---------------+
| Bread | 8-Dec | 6 |
+-------+--------+---------------+
| Jam | 15-Dec | 7 |
+-------+--------+---------------+
It is like finding Max Values depending on Lookup Values. It is very much like Group By and Max in SQL. How do I do it in excel? I've tried index match and also googling. Nothing helps. Please help me.
This is typically done through a pivot table:
Select your data.
Insert a pivot table.
In your case use "Item" & "Date" as rows.
In your case use "Price" as value.
Then click "Price" and under it's field settings choose "Max".
Then in the pivot table itself right any date, click "Filter" > "Top Ten" and make that top 1 based on the max price.
There are many ways to do this through formulae, but if one has Excel O365 it can be done through one single formula, for example:
Formula in E2:
=TRANSPOSE(CHOOSE({1,2,3},TRANSPOSE(UNIQUE(A2:A7)),TRANSPOSE(MINIFS(B2:B7,A2:A7,UNIQUE(A2:A7),C2:C7,MAXIFS(C2:C7,A2:A7,UNIQUE(A2:A7)))),TRANSPOSE(MAXIFS(C2:C7,A2:A7,UNIQUE(A2:A7)))))
Or:
=FILTER(A2:C7,ISNUMBER(MATCH(A2:A7,UNIQUE(A2:A7),0))*LET(X,MAXIFS(C2:C7,A2:A7,UNIQUE(A2:A7)),ISNUMBER(MATCH(C2:C7,MAXIFS(C2:C7,A2:A7,UNIQUE(A2:A7))))*ISNUMBER(MATCH(B2:B7,MINIFS(B2:B7,A2:A7,UNIQUE(A2:A7),C2:C7,X),0))))
I am trying to create an MDX measure in Excel (in OLAP Tools) that will count how many members there are for every other item in another dimension. As I don't know the exact syntax and notation for MDX and OLAP cubes I will try to simply explain what I want to do:
I have a pivot table based on an OLAP Cube. I have a Machine Number field stored in one dimension, that is the "parent" and for every machine number there is a number of articles that were produced (in certain period of time). Those articles are represented by Order Numbers. Those numbers are stored in another dimension. I would like the measure to count how many order numbers there are for every machine number.
So the table looks like this:
+------------------+----------------+
| [Machine Number] | [Order Number] |
+------------------+----------------+
| Machine001 | |
| | 111111111 |
| | 222222222 |
| | 333333333 |
| Machine002 | |
| | 444444444 |
| | 555555555 |
| | 666666666 |
| | 777777777 |
+------------------+----------------+
and I would like the result to be:
+------------------+----------------+------------+
| [Machine Number] | [Order Number] | [Measure1] |
+------------------+----------------+------------+
| Machine001 | | 3 |
| | 111111111 | |
| | 222222222 | |
| | 333333333 | |
| Machine002 | | 4 |
| | 444444444 | |
| | 555555555 | |
| | 666666666 | |
| | 777777777 | |
+------------------+----------------+------------+
I've tried using the COUNT function with EXISTING as well, but it wouldn't work (always showing 1, or the same wrong number for every machine). I believe that I have to somehow connect those two dimensions together so the Order Number is dependent to Machine Number, but lacking the knowledge about MDX and OLAP Cubes I don't even know how to ask Google how to do that.
Thanks in advance for any tips and solutions.
Your problem basicly is, you have two attributes in diffrent dimensions. You want to retrive the valid combinations of these attribute, further you want to count the number of attribute values avaliable in the sceond attribute based on the value of the first attribute.
Based on the above problem statement, in an OLAP cube a fact table or a Measure defines the relations between attributes of diffrent dimension linked to the Measure\Fact-Table. Take a look at the example below.(I have used the SSAS sample db Adventureworks)
--Iam trying to find the promotions that were offered for each product category.
select
[Measures].[Internet Sales Amount]
on columns,
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
on rows
from
[Adventure Works]
Result
The result is cross-product of all the product categories and the promotions. Now lets make the cube return the valid combinations only.
select
[Measures].[Internet Sales Amount]
on columns,
nonempty(
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
,[Measures].[Internet Sales Amount])
on rows
from
[Adventure Works]
Result
Now we indicated that it needs to return only valid combinations. Note that we provided a measure that belonged to the fact connecting the two dimensions. Now lets count them
with member
[Measures].[test]
as
count(
nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount])
)
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]
Result
Alternate query
with member
[Measures].[test]
as
{nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount]) }.count
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]