I support a business where customers pays for various services that they use on monthly basis. I would like to use machine learning based on customers' historical usage of various services and predict the future usage (increase or decrease).
I've used two class to create a model where it uses historical month-1 service usages and month-0 usage to predict the growth or decline. But I would like to start using all historical information not only m-1.
How could I do this? Is my option to keep adding (M-2,M-3,M-4) columns? if that's the case I'm going to have hundreds of columns.
I'm new to machine learning and I'm not sure which algorithm is great for the type of analysis I'm doing.
Here is an example of the original table that I have:
Customer Name | MonthName | Service | Usage
------------- | ---------------|---------|------
Customer1 | January, 2017 |Service2 |$400
Customer1 | January, 2017 |Service1 |$300
Customer1 | January, 2017 |Service3 |$0
Customer1 | December, 2017 |Service2 |$600
Customer1 | December, 2017 |Service1 |$500
Customer1 | December, 2017 |Service3 |$700
Customer1 | November, 2016 |Service1 |$500
Customer1 | November, 2016 |Service2 |$50
Customer1 | October, 2016 |Service1 |$800
Customer2 | January, 2017 |Service2 |$400
Customer2 | January, 2017 |Service1 |$800
Customer2 | December, 2017 |Service2 |$600
Customer2 | December, 2017 |Service1 |$500
Customer2 | November, 2016 |Service1 |$500
Customer2 | November, 2016 |Service2 |$50
Customer2 | October, 2016 |Service1 |$800
Here is the table I'm using right now to come up with 2 class model:
+----------------+------------------+-----------------+-----------------+-----------------+-----------+-----------+-----------+-----------+-------+--------------------+
| Customer Name | MonthName | Service1 - M-1 | Service2 - M-1 | Service3 - M-1 | Usage M-1 | Service1 | Service2 | Service3 | Usage | Usage Decline Flag |
+----------------+------------------+-----------------+-----------------+-----------------+-----------+-----------+-----------+-----------+-------+--------------------+
| Customer1 | October, 2016 | 0 | 0 | 0 | 0 | 800 | | | 800 | 0 |
| Customer1 | November, 2016 | 800 | | | 800 | 500 | 50 | | 550 | 1 |
| Customer1 | December, 2017 | 500 | 50 | | 550 | 500 | 600 | 700 | 1800 | 0 |
| Customer1 | January, 2017 | 500 | 600 | 700 | 1800 | 300 | 400 | 0 | 700 | 1 |
| Customer2 | October, 2016 | 0 | 0 | 0 | 0 | 1600 | | | 1600 | 0 |
| Customer2 | November, 2016 | 1600 | | | 1600 | 500 | 100 | | 600 | 1 |
| Customer2 | December, 2017 | 500 | 100 | | 600 | 500 | 600 | | 1100 | 0 |
| Customer2 | January, 2017 | 500 | 600 | | 1100 | 800 | 400 | | 1200 | 0 |
+----------------+------------------+-----------------+-----------------+-----------------+-----------+-----------+-----------+-----------+-------+--------------------+
try this - Here is the code that does convert rows to columns for having sales for previous days - https://gallery.cortanaintelligence.com/CustomModule/Generate-Lag-Features-1 - source code for this - https://gist.github.com/nk773/a2ed7cd0ce8020647f5e7711f749b3b5
Related
I'm trying to find the week of the year of particular dates using a formula in Excel. I found that Excel is considering the Sunday as the 1st day of the week instead of Monday as the first day.
I used the formula =WEEKNUM(A2) (where A2 is the date row) and got the result as below
--------------------------------
| Date | Week of Year |
--------------------------------
| 5/16/2015 | 20 |
| 5/17/2015 | 21 |
| 5/18/2015 | 21 |
| 5/19/2015 | 21 |
| 5/20/2015 | 21 |
| 5/21/2015 | 21 |
| 5/22/2015 | 21 |
| 5/23/2015 | 21 |
| 5/24/2015 | 22 |
| 5/25/2015 | 22 |
--------------------------------
But how do I get the result as below (Considering Monday as the first day of the week)
--------------------------------
| Date | Week of Year |
--------------------------------
| 5/16/2015 | 20 |
| 5/17/2015 | 20 |
| 5/18/2015 | 21 |
| 5/19/2015 | 21 |
| 5/20/2015 | 21 |
| 5/21/2015 | 21 |
| 5/22/2015 | 21 |
| 5/23/2015 | 21 |
| 5/24/2015 | 21 |
| 5/25/2015 | 22 |
--------------------------------
Pass a second argument to WEEKNUM: 2 stands for Monday.
=WEEKNUM(A2, 2)
I want to create a new column that takes previous year's rates if the current month is January else use current rates.
| Date | Sales | Rates | Month | Year |
|:-----------|---------:|:-----:|-------:|:-------:|
| 1/1/2017 | 10000 | 8.0 | 1 | 2017 |
| 1/1/2018 | 20000 | 8.2 | 1 | 2018 |
| 2/1/2018 | 15000 | 8.2 | 2 | 2018 |
| 1/1/2019 | 11000 | 8.5 | 1 | 2019 |
| 3/1/2019 | 18000 | 8.5 | 3 | 2019 |
| 1/3/2020 | 22000 | 9.0 | 1 | 2020 |
Here the new column should have previous years rates if the month is January.
I tried this but failed to get results.
if [Year] > 2017 and [Month] = 1 then [Rates] = [Rates] and [Year] = 2018 else [Rates]
Try this as a calculated column:
New Rates =
VAR prevYear = 'Table'[Year] - 1
VAR ratePreviousYear =
MAXX ( FILTER ( 'Table', 'Table'[Year] = prevYear ), 'Table'[Rates] )
RETURN
IF (
'Table'[Month] > 1
|| ISBLANK ( ratePreviousYear ),
'Table'[Rates],
ratePreviousYear
)
I have a dataset with the current stock for some products:
+--------------+-------+
| Product | Stock |
+--------------+-------+
| chocolate | 300 |
| coal | 70 |
| orange juice | 400 |
+--------------+-------+
and the sales for every product over the years for the current month and the next month in another dataset:
+--------------+------+-------+-------+
| Product | Year | Month | Sales |
+--------------+------+-------+-------+
| chocolate | 2017 | 05 | 55 |
| chocolate | 2017 | 04 | 250 |
| chocolate | 2016 | 05 | 70 |
| chocolate | 2016 | 04 | 200 |
| | | | | | | | |
| coal | 2017 | 05 | 40 |
| coal | 2017 | 04 | 30 |
| coal | 2016 | 05 | 50 |
| coal | 2016 | 04 | 20 |
| | | | | | | | |
| orange juice | 2017 | 05 | 400 |
| orange juice | 2017 | 04 | 350 |
| orange juice | 2016 | 05 | 400 |
| orange juice | 2016 | 04 | 300 |
+--------------+--------------+-------+
I want to compute the stock that I will need to order for the next month, by computing the expected sales over the current month and the next month, using the following formula:
ExpectedSales = max(salesMaxCurrentMonth) + max(salesMaxNextMonth)
The orders will then be
Orders = ExpectedSales * (1 + margin) - Stock
Where margin is, for example, 10%.
I tried to group by several columns using GroupBy, as in the following, but it seems to aggregate by Stock instead of Product:
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(
“Orders”,
Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(
“Orders”,
Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet()
.show();
You got the logic correct in terms of aggregation but there is another way to build your CellSet, where you provide a map to describe the location of the query which generates it.
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(“Orders”, Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(“Orders”, Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet(
Empty.<String, Object>map()
.put(“Product”,null)
.put(“Stock”, null))
.show();
Where null in a location represents the wildcard *.
I have this set of data in Excel:
Date | Year | Spend | Budget | Product
Jan | 2017 | 300 | 350 | Pencils
Feb | 2017 | 450 | 450 | Pencils
March | 2017 | 510 | 520 | Pencils
... | ... | ... |
Dec | 2017 | 234 | 240 | Pencils
Jan | 2018 | 222 | 222 | Pencils
Feb | 2018 | 458 | 500 | Pencils
March | 2018 | 345 | 400 | Pencils
... | ... | ... |
Dec | 2018 | 600 | 600 | Pencils
I'm trying to build a pivot table that shows:
RT stands for "running total"
Av stands for "available"
Year | 2017
| Jan | RT | Av | Feb | RT | Av | March | RT | Av
Pencils| 300 | 300| 50 | 450 | 750| 50| 510 |1260| 60
In brief, "available" = running total + budget for remaining months. Any ideas?
Thanks!
If I understand correctly:
available = sum(budget) - sum(spent)
running total = sum(spent)
You should add a column for running total, that will sum the spent column from the beginning up to current row.
And add a column for available, that will sum the budget column from the beginning up to current row, and will reduce the value of the "running total" column (of the same row) from it.
I'm trying to model some outbound calling data in PowerPivot. We have reps across multiple locations, and in general we breakdown our outbound calling into two periods of the day (before and after 12pm).
We can export data from our phone system a list of every call made for a day -- let's say an example is as follows:
+------------+-------------+-------+-----------+-------------+
| Date | Call Length | Agent | Workgroup | Call Period |
+------------+-------------+-------+-----------+-------------+
| 01.01.2016 | 00:05:26 | Sam | Sydney | 1 |
| 01.01.2016 | 00:15:05 | Sam | Sydney | 1 |
| 01.01.2016 | 00:55:22 | John | Sydney | 2 |
| 01.01.2016 | 00:45:11 | Sam | Sydney | 2 |
| 01.01.2016 | 00:04:52 | John | Sydney | 1 |
| 01.01.2016 | 00:01:52 | Timmy | London | 1 |
| 01.01.2016 | 00:02:21 | Timmy | London | 2 |
| 01.01.2016 | 00:05:21 | Karen | London | 1 |
| 02.01.2016 | 00:15:21 | Sam | Sydney | 1 |
| 02.01.2016 | 00:42:44 | Sam | Sydney | 2 |
| 02.01.2016 | 01:52:22 | John | Sydney | 1 |
| 02.01.2016 | 00:53:24 | John | Sydney | 1 |
| 02.01.2016 | 00:05:53 | Kerry | Sydney | 2 |
| 02.01.2016 | 00:43:43 | Sam | Sydney | 2 |
| 02.01.2016 | 01:08:00 | John | Sydney | 2 |
| 02.01.2016 | 00:13:52 | Timmy | London | 2 |
| 02.01.2016 | 00:25:44 | Timmy | London | 1 |
| 02.01.2016 | 02:58:31 | Karen | London | 1 |
| 02.01.2016 | 00:08:37 | Timmy | London | 2 |
| 02.01.2016 | 00:12:28 | Karen | London | 2 |
+------------+-------------+-------+-----------+-------------+
What I'm trying to calculate is the average daily time spent on phone per Workgroup, eg. on average how long is each agent on the phone at each location.
I'm guessing the arithmetic is as follows:
Measure 1: Total talk time for each Agent (eg. sum of all talk time for the day)
Measure 2: Average agent total talk time per workgroup (eg. sum of the above grouped by workgroup, divided by number of agents in that workgroup)
The output might look something like this (but doesn't have to be):
+------------+-----------+-----------------------+-----------------+-----------------------------+
| Date | Workgroup | Total Number of Calls | Total Talk Time | Average Talk Time per Agent |
+------------+-----------+-----------------------+-----------------+-----------------------------+
| 01.01.2016 | Sydney | 11 | 03:02:42 | 1:34:53 |
| | London | 4 | 02:24:51 | 01:13:41 |
| 02.01.2016 | Sydney | 5 | 01:52:05 | 00:56:51 |
| | London | 52 | 10:11:23 | 03:51:11 |
+------------+-----------+-----------------------+-----------------+-----------------------------+
Apologies if I'm unclear it what I'm asking.
Slicing your data on a pivot table will do the calculations.
you only need the following calculations:
DurationOfCall :=sum(MyTable[CallLength])
NrOfCalls :=countrows(MyTable)
AvgDuration :=DIVIDE([DurationOfCall],[NrOfCalls])
this will give the following result (on your sample dataset):
Workbook with testcase: attachment