Excel Forecast with occasional random numbers [Capacity Planning] - excel

I'm currently working on a capacity plan and forecasting for future growth. As we get larger and acquire more customers, we'll eventually stall out and our numbers will start to slow. I'd like to figure a way to add random numbers into my forecast, that'll show a "decrease"
Example:
2008: 20
2009: 32
2010: 45
2011: 49
2012: 52
2013: 60
2014: 72
2015: 88
2016: 102
2017: 113
2018: 142
2019: 130
2020: 127
etc, etc - hope this all makes sense.

Statistical software like R would me much more suited for this. Ideally, as a business, you will continue acquiring more customers at a steady rate, but the relative percentages of customer acquisitions will level off.
I will explain the Excel I made for you. For the years 2008-2017, we have data which you gave me. I use that data to calculate the 2-year moving average and 2-year moving standard deviation over and over again. These values are not all that useful, except for in the calculation of the random numbers we are looking for.
In 2018, we generate random numbers on a Gaussian distribution using the past-2-year moving average and past-2-year standard deviation. This generates completely random numbers, and generally come out to be somewhat interesting. These numbers are recalculated every time you edit a formula, which is neat (double click any cell, press enter, watch the numbers move). I also included a graph of the expected growth over time. It is a fun tool to play with, if I do say so myself. I included a screenshot here:
For those of you who like statistics, the function I used was =NORMINV(RAND(),mean,stddev). This creates a Gaussian distribution around some mean value and stddev. It is quite interesting, and has the nice effect of "leveling off" most of the time in calculations, which is perfect for this application.
The only problem is, I have no idea how to get this small file to you. Someone please comment back to let me know how so I can help this poor soul out.

Related

How to compute a MDX measure in Excel that multiplies the values in one variable by a factor depending on their value in another variable?

First of all I'd like to make it clear, that I'm a total newbie to MDX and only grasp the absolute basics. I'll do my very best to try and describe my issue, and hope that some of you guys have time to help me. If it's to much to ask for, I totally understand, but not putting my question out there wouldn't get me anywhere either way, right.
I'm working with and Excel Olap Pivot table, and I'm looking to create a new MDX measure that multiplies the sum of expenditure in each of the last five years by a factor, like in the example below:
Year
Expediture
Factor
2018
100.000
1.05
2019
100.000
1.04
2019
100.000
1.03
2020
100.000
1.02
2021
100.000
1.01
I have an Excel Olap Pivot table with the measure [Measures].[Expected Expenditure] and the dimension [Date].[Year].[Year]). The dimension "[Date].[Year].[Year])" also holds data for other years than the five years I need to weight the expenditure in. The Factor is a set number and I'm looking to hard-code that into the calculation.
How do I go about creating a new MDX measure that weights the expenditure in each of the five years, but doesn't add a weight to the expenditure in the years outside the scope?
Please let me know if the above description is deficient or if there's anything I need to clarify further. English is not my native tongue and I apologize for anything incoherent writing.
Best regards,
Magnus

Forecast or Estimate Next Month Sales For Each Customer

The problem statement that I am currently working on has data available for 27 customers and the purchase amount they have transacted on (in total) for each month in 2021 from Jan until Sept. The data looks like the attached image with this question/post.
sample dataset
I could simply use average to find the next value but that'd not be precise to a very good extent, but then, in absence of any other data or features/columns, is that the only way to solve this question, or are there any other methods anyone can suggest? Note, both Excel &/or Python examples are fine.
Additional Note: I have already tried FORECAST functions in Excel, but I am not sure if the outcome is correct or not, since Microsoft documentation merely provides the formula by means of which this function performs the calculations. Overall there are 5 total types of FORECAST(.**) functions that Excel provides, but the documentation is poor, hence tomorrow, if I want to write the same solution in Python or any other programming language.
Taking a cursory glance at the data, there's a complexity that I'm missing like seasonality, trend, noise, outliers, etc., but let's just assume that this data is a simple trend line for each client.
From a purely high-level, excel can do a simple FORECAST.ETS(target_date, values, timeline, [seasonality], [data_completion], [aggregation]) formula.
It can be streamlined with excel's built in data tool Forecast Sheet.
I could talk about Python but that's a little more hands on with a time series forecast.

Excel continues time function graph

I would like to make a plot in excel. graph represent stock amount and months. but I would like to do it continuous time. Each mounth it starts from 30 it decrease in time to 10 until next month and stock goes 30 again each month with 20 piece increase. I couldn't post the graph here. Thanks for your all help.
"http://i.hizliresim.com/G5QE26.jpg" this is the link that i want to do.
This might get you started.
You could use a scatterplot with straight lines connecting the data points:
There are numerous ways to tweak the appearance, though I don't know of any easy way to get the numbers to appear as dates.

Converting TEXT that represents NEGATIVE TIME value to a number or time value for adding (Excel)

I've got a spreadsheet (Office 2007 version of Excel) full of text entries that are negative time values, example "-0:07" as in an employee took 7 mins less to complete a job than expected. I need to perform mathematical calculations on these entries and am looking for a more elegant formula/method than I've come up with so far.
I know about 1904 date system and * or / by 24 to convert back and forth, the problem is getting a formula that will recognize the text entry as a negative time value.
I've tried value(), *1, which both work on the text fields if the number is positive, but the "-" seems to mess those up. Even paste-special/add fails to recognize these as numbers.
Here's what I came up with that gets the job done, but it's just so ugly to me:
=IF(LEFT(E5,1)="-",((VALUE(RIGHT(E5,LEN(E5)-1)))*-1.0),VALUE(E5))
Obviously my text entry is in cell E5 in this example.
This works, so I'm not desperate for a solution, but for educational purposes (and smaller code) I'd like to know if there's a better way to this. Does anyone have a suggestion for something shorter, easier?
Thanks.
P.S. - an interesting tidbit here, I use Excel at work, but not at home, so I uploaded a sample spreadsheet to Google Docs, and it actually handles the Value() command on those entries properly. Weird, huh?
Thanks again for any suggestions.
Excel doesn't handle time spans in cells. It only deals with time. When you do "00:07" it is then converted to 0.0048611 which is the same as Jan 1st 1900 12.07 am. So if you did 2 minutes minus 7 minutes it would give at best 11.55pm.
The way you do it is the only way.

Statistically removing erroneous values

We have a application where users enter prices all day. These prices are recorded in a table with a timestamp and then used for producing charts of how the price has moved... Every now and then the user enters a price wrongly (eg. puts in a zero to many or to few) which somewhat ruins the chart (you get big spikes). We've even put in an extra confirmation dialogue if the price moves by more than 20% but this doesn't stop them entering wrong values...
What statistical method can I use to analyse the values before I chart them to exclude any values that are way different from the rest?
EDIT: To add some meat to the bone. Say the prices are share prices (they are not but they behave in the same way). You could see prices moving significantly up or down during the day. On an average day we record about 150 prices and sometimes one or two are way wrong. Other times they are all good...
Calculate and track the standard deviation for a while. After you have a decent backlog, you can disregard the outliers by seeing how many standard deviations away they are from the mean. Even better, if you've got the time, you could use the info to do some naive Bayesian classification.
That's a great question but may lead to quite a bit of discussion as the answers could be very varied. It depends on
how much effort are you willing to put into this?
could some answers genuinely differ by +/-20% or whatever test you invent? so will there always be need for some human intervention?
and to invent a relevant test I'd need to know far more about the subject matter.
That being said the following are possible alternatives.
A simple test against the previous value (or mean/mode of previous 10 or 20 values) would be straight forward to implement
The next level of complexity would involve some statistical measurement of all values (or previous x values, or values of the last 3 months), a normal or Gaussian distribution would enable you to give each value a degree of certainty as to it being a mistake vs. accurate. This degree of certainty would typically be expressed as a percentage.
See http://en.wikipedia.org/wiki/Normal_distribution and http://en.wikipedia.org/wiki/Gaussian_function there are adequate links from these pages to help in programming these, also depending on the language you're using there are likely to be functions and/or plugins available to help with this
A more advanced method could be to have some sort of learning algorithm that could take other parameters into account (on top of the last x values) a learning algorithm could take the product type or manufacturer into account, for instance. Or even monitor the time of day or the user that has entered the figure. This options seems way over the top for what you need however, it would require a lot of work to code it and also to train the learning algorithm.
I think the second option is the correct one for you. Using standard deviation (a lot of languages contain a function for this) may be a simpler alternative, this is simply a measure of how far the value has deviated from the mean of x previous values, I'd put the standard deviation option somewhere between option 1 and 2
You could measure the standard deviation in your existing population and exclude those that are greater than 1 or 2 standard deviations from the mean?
It's going to depend on what your data looks like to give a more precise answer...
Or graph a moving average of prices instead of the actual prices.
Quoting from here:
Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.
Google is your friend, you know. ;)
For your specific question of plotting, and your specific scenario of an average of 1-2 errors per day out of 150, the simplest thing might be to plot trimmed means, or the range of the middle 95% of values, or something like that. It really depends on what value you want out of the plot.
If you are really concerned with the true max and true of a day's prices, then you have to deal with the outliers as outliers, and properly exclude them, probably using one of the outlier tests previously proposed ( data point is x% more than next point, or the last n points, or more than 5 standard deviations away from the daily mean). Another approach is to view what happens after the outlier. If it is an outlier, then it will have a sharp upturn followed by a sharp downturn.
If however you care about overall trend, plotting daily trimmed mean, median, 5% and 95% percentiles will portray history well.
Choose your display methods and how much outlier detection you need to do based on the analysis question. If you care about medians or percentiles, they're probably irrelevant.

Resources