How to aggregate on a column while grouping by several columns values using CoPPer?

How to aggregate on a column while grouping by several columns values using CoPPer? - aggregation

I have a dataset with the current stock for some products:
+--------------+-------+
| Product      | Stock |
+--------------+-------+
| chocolate    | 300 |
| coal         | 70 |
| orange juice |   400 |
+--------------+-------+
and the sales for every product over the years for the current month and the next month in another dataset:
+--------------+------+-------+-------+
| Product      | Year | Month | Sales |
+--------------+------+-------+-------+
| chocolate    | 2017 | 05 |    55 |
| chocolate    | 2017 | 04 |   250 |
| chocolate    | 2016 | 05 |    70 |
| chocolate    | 2016 | 04 |   200 |
|     |   | | |    | | | |
| coal         | 2017 | 05 |    40 |
| coal         | 2017 | 04 |    30 |
| coal         | 2016 | 05 |    50 |
| coal         | 2016 | 04 |    20 |
|     |   | | |    | | | |
| orange juice | 2017 |    05 | 400 |
| orange juice | 2017 |    04 | 350 |
| orange juice | 2016 |    05 | 400 |
| orange juice | 2016 |    04 | 300 |
+--------------+--------------+-------+
I want to compute the stock that I will need to order for the next month, by computing the expected sales over the current month and the next month, using the following formula:
ExpectedSales = max(salesMaxCurrentMonth) + max(salesMaxNextMonth)
The orders will then be
Orders = ExpectedSales * (1 + margin) - Stock
Where margin is, for example, 10%.
I tried to group by several columns using GroupBy, as in the following, but it seems to aggregate by Stock instead of Product:
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(
“Orders”,
Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(
“Orders”,
Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet()
.show();

You got the logic correct in terms of aggregation but there is another way to build your CellSet, where you provide a map to describe the location of the query which generates it.
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(“Orders”, Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(“Orders”, Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet(
Empty.<String, Object>map()
.put(“Product”,null)
.put(“Stock”, null))
.show();
Where null in a location represents the wildcard *.

Related

How to join tables creating a new row if it doesn't exist

How to join tables creating a new row if it doesn't exist.
I tried: Products.join(Class, Products.id_product == Class.id_product, 'right')
Products
+-----------+-------+----------+-----------+
|day | store | quantity | id_product|
|2022-05-05 | 01 | 10 | 1 |
|2022-05-05 | 01 | 10 | 2 |
|2022-05-05 | 01 | 7 | 3 |
|2022-05-22 | 01 | 8 | 1 |
+-----------+-------+----------+-----------+
Class
+-----------+-----+
|id_product | size|
|1 | S |
|2 | L |
|3 | XL |
+-----------+-----+
I would like new rows to be created with null value for quantity of stock, but keeping the information of day, store, id_product and size.
My result
+-----------+-------+----------+------------+-----+
|day | store | quantity | id_product | size|
|2022-05-05 | 01 | 10 | 1 | S |
|2022-05-05 | 01 | 10 | 2 | L |
|2022-05-05 | 01 | 7 | 3 | XL |
|2022-05-22 | 01 | 8 | 1 | S |
+-----------+-------+----------+------------+-----+
Expected
+-----------+-------+----------+------------+-----+
|day | store | quantity | id_product | size|
|2022-05-05 | 01 | 10 | 1 | S |
|2022-05-05 | 01 | 10 | 2 | L |
|2022-05-05 | 01 | 7 | 3 | XL |
|2022-05-22 | 01 | 8 | 1 | S |
|2022-05-22 | 01 | null | 2 | L |
|2022-05-22 | 01 | null | 3 | XL |
+-----------+-------+----------+------------+-----+

Guess you need to first have all combinations of day, store, product and size and then outer join with products.
products
.select($"day", $"store")
.distinct
.crossJoin(classes)
.as("keys")
.join(
products.as("prds"),
Seq("day", "store", "id_product"),
"left"
)
.select(
$"keys.day", $"keys.store", $"prds.quantity",
$"keys.id_product", $"keys.size"
)

Getting the week of the year of particular days, considering Monday as the first day of the week in Excel Formula

I'm trying to find the week of the year of particular dates using a formula in Excel. I found that Excel is considering the Sunday as the 1st day of the week instead of Monday as the first day.
I used the formula =WEEKNUM(A2) (where A2 is the date row) and got the result as below
--------------------------------
| Date | Week of Year |
--------------------------------
| 5/16/2015 | 20 |
| 5/17/2015 | 21 |
| 5/18/2015 | 21 |
| 5/19/2015 | 21 |
| 5/20/2015 | 21 |
| 5/21/2015 | 21 |
| 5/22/2015 | 21 |
| 5/23/2015 | 21 |
| 5/24/2015 | 22 |
| 5/25/2015 | 22 |
--------------------------------
But how do I get the result as below (Considering Monday as the first day of the week)
--------------------------------
| Date | Week of Year |
--------------------------------
| 5/16/2015 | 20 |
| 5/17/2015 | 20 |
| 5/18/2015 | 21 |
| 5/19/2015 | 21 |
| 5/20/2015 | 21 |
| 5/21/2015 | 21 |
| 5/22/2015 | 21 |
| 5/23/2015 | 21 |
| 5/24/2015 | 21 |
| 5/25/2015 | 22 |
--------------------------------

Pass a second argument to WEEKNUM: 2 stands for Monday.
=WEEKNUM(A2, 2)

Dax measure for monthly running total on weekly granularity

I have to visualize an accumulated monthly budget on a daily level.
Means a running total for the individual month on a daily basis.
The budget targets are provided in a table like this:
+------+------------+--------+
| Area | Year-Week | Budget |
+------+------------+--------+
| A | 2020-01 | 50 |
| A | 2020-02 | 25 |
| A | 2020-03 | 50 |
| A | 2020-04 | 25 |
| B | 2020-01 | 50 |
| B | 2020-02 | 50 |
| B | 2020-03 | 50 |
| B | 2020-04 | 75 |
+------+------------+--------+
As you can see, the monthly budget for January is the sum of 2020-01 - 2020-04
The structure is provided in the calendar table. So I have the information on which week belongs to which month.
What I need to do is to create a dax measure that divides the weekly budget by the working days.
THe calendar has also the information of calendar days. For this purpose I have add this measure:
Working Days =
VAR vSelectedMonth =
SELECTEDVALUE ( 'Calendar'[Fiscal Month Number])
RETURN
CALCULATE (
COUNTROWS ( 'Calendar' ),
FILTER (
'Calendar',
'Calendar'[Fiscal Month Number] = vSelectedMonth
&& 'Calendar'[IsWorkingDay] = 1
)
)
Now, how can I divide the weekly budget by the working days for the corresponding week and than to accumulate this information for the whole month?
Based on the provided information the result should be something like that:
+------+------------+----------------+
| Area | Date | Running Budget |
+------+------------+----------------+
| A | 01.01.2020 | 10 |
| A | 02.01.2020 | 20 |
| A | 03.01.2020 | 30 |
| A | 04.01.2020 | 40 |
| A | 05.01.2020 | 50 |
| A | 06.01.2020 | 50 |
| A | 07.01.2020 | 50 |
| A | 08.01.2020 | 55 |
+------+------------+----------------+
Of course, if the corresponding date is not a working day, that value from the day before should be shown.
I am grateful for every help!
Please let me know if you need further information.
Best Regards

How to calculate running total + total remaining in Excel?

I have this set of data in Excel:
Date | Year | Spend | Budget | Product
Jan | 2017 | 300 | 350 | Pencils
Feb | 2017 | 450 | 450 | Pencils
March | 2017 | 510 | 520 | Pencils
... | ... | ... |
Dec | 2017 | 234 | 240 | Pencils
Jan | 2018 | 222 | 222 | Pencils
Feb | 2018 | 458 | 500 | Pencils
March | 2018 | 345 | 400 | Pencils
... | ... | ... |
Dec | 2018 | 600 | 600 | Pencils
I'm trying to build a pivot table that shows:
RT stands for "running total"
Av stands for "available"
Year | 2017
| Jan | RT | Av | Feb | RT | Av | March | RT | Av
Pencils| 300 | 300| 50 | 450 | 750| 50| 510 |1260| 60
In brief, "available" = running total + budget for remaining months. Any ideas?
Thanks!

If I understand correctly:
available = sum(budget) - sum(spent)
running total = sum(spent)
You should add a column for running total, that will sum the spent column from the beginning up to current row.
And add a column for available, that will sum the budget column from the beginning up to current row, and will reduce the value of the "running total" column (of the same row) from it.

Access Database Design(multilevel) with Exporting Issue for Excel

Hi I am currently working on a project that contains individual information for each month and I want to build a table or two to contain the information(I don't want to create a table for each month). a simple illustration will be :
Jan
weight height
student a
student b
Feb
weight height
student a
student b
student c
what I what is just to export data to excel in the form of the above, weight, height column are fixed but I want to have data clustered by month so that the data organization is clearer.
May I ask how to design the database so that the abovementioned requirement could be met? Thanks.

Here are the tables you'll need to store the information:
students
id unsigned int(P)
name varchar(50)
+----+------+
| id | name |
+----+------+
| 1 | John |
| 2 | Mary |
| 3 | Tina |
| .. | .... |
+----+------+
In the measurements table the Primary Key is formed by the student_id, year and month. The student_id is also a foreign key to the students table.
measurements
student_id unsigned int(F students.id)-\
year unsigned int ----------------(P)
month unsigned int ---------------/
height unsigned int
weight unsigned int
+------------+------+-------+--------+--------+
| student_id | year | month | height | weight |
+------------+------+-------+--------+--------+
| 1 | 2013 | 11 | 70 | 200 |
| 2 | 2013 | 11 | 65 | 130 |
| 1 | 2013 | 12 | 70 | 192 |
| 2 | 2013 | 12 | 65 | 126 |
| 3 | 2013 | 12 | 68 | 140 |
| .......... | .... | ..... | ...... | ...... |
+------------+------+-------+--------+--------+
And then a query to extract the information:
SELECT name, height, weight, year, month
FROM students s
LEFT JOIN measurements m ON s.id = m.student_id
ORDER BY year, month, name
Which will give you:
+------+--------+--------+------+-------+
| name | height | weight | year | month |
+------+--------+--------+------+-------+
| John | 70 | 200 | 2013 | 11 |
| Mary | 65 | 130 | 2013 | 11 |
| John | 70 | 192 | 2013 | 12 |
| Mary | 65 | 126 | 2013 | 12 |
| Tina | 68 | 140 | 2013 | 12 |
+------+--------+--------+------+-------+
Which is the data you want, sorted in the way you want. Any further formatting of the data is up to your application.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to aggregate on a column while grouping by several columns values using CoPPer? - aggregation

Related

How to join tables creating a new row if it doesn't exist

Getting the week of the year of particular days, considering Monday as the first day of the week in Excel Formula

Dax measure for monthly running total on weekly granularity

How to calculate running total + total remaining in Excel?

Access Database Design(multilevel) with Exporting Issue for Excel

Categories

Resources