Rate increase analysis, way to to calculate - statistics

Story that I need to go over auditors' note to provide feedback for all Provisions where Increase/Decrease > 20% line in Column H. Looking into provided doc I don't get how they calculate this e.g. = 1 - (RateOld / RateNew)
There nothing special behind those Rates, higher rate is better, and indicates better participation.
I need to be very careful before jumping into conclusion, don't want to mess with auditor's way. Anybody knows what is the logic/science/approach behind his formula, I pasted on the right more traditional ones I used.
Best

Thanks all, assume that they just switched new and old columns in provided sheet, so we divide by New Rate rather then Old. those formulas are same, first one always positive:
1-(G/D) = (D-G)/D
And logically this answers question how much change was last year, rather then This year.

Related

Use Excel to optimise KFC order

Don't laugh, but, from time to time, my friends and I host a multiple-course KFC dinner, and I have a spreadsheet to optimise the order. This is to make sure we order the right combination of 'bucket'-type items (i.e. SKUs that contain multiple pieces, often of different types):
to minimise leftover items
to reduce the total cost
Here is the spreadsheet I currently use, and here's a screenshot:
To use it, you first specify the number of participants in A2, and what you want each person to have in E2:E6 (we're really only interested in the chicken, so 'sides' are treated as generic to simplify).
Here's the manual part, that I'd like to improve.
The next step is to look at the ideal totals for each item (F2:F6), and to try to set the right quantities (H12:H20) of the 'bucket'-type SKUs that I have recreated (A11:G20), so that the output totals (H21:M21) match the ideal totals (F2:F6).
The optimisation part is to get the deltas (H22:M22) as close to zero as possible, and to get the total cost (N21) as low as possible.
So, my question is: is there a way to do this better? I think Excel has some sort of Solver functionality, but I'm afraid I don't know how I'd go about even starting to use that, as my Excel skills are pretty rudimentary. Oh, and in case it makes a difference regarding functionality, I'm using Excel for Mac v16.37.
Any thoughts gratefully appreciated! :)
I can't take any credit for this, but am happy to say that GSerg left me a couple of comments that pointed me in the right direction, and I now have Solver set up to organise my chicken parties!
Here are the parameters for anyone who is curious:

Dividing people in to groups based on strength and rank

Edited:
after using Solver which Saulo Suggested I have managed to get excel sorting them in to groups up to 8 groups of 3. though am approaching troubles when going further ideally at this time I need to be able to do 18 groups of 3. but even with the same settings obviously adjusting for the increase in groups excel seems to belly up on the process and fails, any suggestions to adapt to this?
I am trying to figure out an easy and as accurate as possible without going too crazy with the math and formulas as I am basic with my excel coding (coding in general) to calculate the ideal groups of 3 based on rank and strength for a video game.
I want to pair the strongest with the weakest and then fill the gaps evenly for the 3rd person. so, that each team’s overall strength is the same roughly.
factors I have is a designated leader(rank) and an overall power level(strength).
doing this manually isn't too hard but trying to automate it is. any thoughts or suggestions would be amazing!?
something like this but automated which is where I am getting stuck, as I want to be able to add more players and adjust strengths as they come along.
Hope this makes sense.
Jordan, you have a classic situation to use "Solver". First of all, you have to make "Solver available in your Excel. Select Home -> Option -> Supplements -> Solver. Then, the solver button´s will be at the "Data" menu.
Solver is about solving a problem with especific conditions, changing specific cell, with specifc purpose. In your case, your purpose is creat teams with the minimal strength diference. A condition of your problem is that teams should have the same number of players. See how I organized the sheet.
Sheet
When do you open solver, the first field is "Seat Goal". Our goal (or purpose) is reduce the diference between teams as minimal as possible. So we selected the cell with the diference between teams. Then we have to tell to excel that our purpose is that cell have the minimal value (chossing "min").
Then we have to tell excel wich cells they can change values to achieve our goal. In this case, Excel can change the teams of player, so we select the cells with the teams.
Last we have to tell excel whats condition (or restrictions) of this problem. The first restction is that the total of player of team 1 is tree. The total of players of team 2 is, both, 3. Then we have to tell to excel what are the limits (superior and inferior) of values of variable cells. In this case, we chose superior or equal to 1, AND (other restrction) inferior or equal to 2.
Ok. Now we have a problem with specif goal, changing the value of specifc cells, with specif restriction. Now solver can work. Then choose the method of soluction "Evolutionary", Honestly, I do know the diference between the methods, but my experience is that evolutionary method works better than the others.
I recomend reading the excel tutorial on solver. At first, all of us think that is too dificult, but believe that is simpler than it seems.

Excel Index Function

I just want to thank you guys in advance. I think you guys are doing a great job in helping people out with programming stuff. Pats on the back for all of you.
Here is what I've been working on: I have daily stock price return data on about 4000 stocks. I want to add them to my portfolio after observing their performance for 12 months. I will choose the top 10% best performers and bottom 10% worst performers. I will create multiple portfolios over a period of time. I have done that with no problem.
I want to use the INDEX function to calculate the daily return of my portfolio. Not all 4000 stocks are in my portfolio, about 300 stocks are in my portfolio at any given time. The daily portfolio returns will be calculated by multiplying the weights (they are equal weighted, so 1/300) to that stock's return on the specific date. I assume it has to do with a combination of INDEX, SUMPRODUCT, and IF or MATCH functions.
I have been thinking this for a long time and I just can't get to the bottom of it. I have attached pictures for a portion of what I was working on. I think will give you a good picture of what I'm trying to do. I bet this is such an easy thing for you guys. I hope you can help me out! Thanks again!
PICTURES:IN or OUT portfolio & Stock's individual returns
Charles
Not sure I understood your problem, but here is a trial suggestion:
You get data for 4000 stocks while you are monitoring 300. So, you need to find the correct one within your sheet (there will be 3700 that will not match anything).
If you have your stocks listed in, say, column "A", you could use the function LOOKUP (well explained in the Web). If you need to get the row of your stock, you can use the function MATCH.
If this is not what you are looking for, it means that I (at least) did not understand you, so you would need to add details to your question.

Removing Lower/Upper Fence of outliers from input data to then be evaluated

What I have attempted:
AVERAGEIF(B11:V11,">+MEDIAN(B11:V11)")
What I am trying to do:
I would like to take the average of the upper half of given data. Elaborating more. I would like to find a formula that will allow me to remove a given lower fence of outliers and dissect the data then given to me. I would greatly prefer to maintain this formula within one cell "not grabbing different results from formulas within multiple cells".
Update:
Following through I found the solution.. I think.
One thing I should have explained further:
The data coming in replicating a typical sqrt function.
What I wanted to achieve is to capture the mean of the "plateau" of the data.
The equation I used was:
=AVERAGEIF(B3:B62,(">"&+TRIMMEAN(B3:B62,0.8)),B3:B62)
This was something I just copied and pasted. of course "B3" and "B62" are significant only for my application.
My rough explanation of the equation:
TRIMMEAN will limit the AVERAGE to the top 20%(">")(0.8) of the data selected. So for my application, this SHOULD give me a rough mean of the "plateau" of the data i would like to find the mean for.
This formula calculates the Median() of the range, then AverageIf() uses the median and only grabs values that are greater than or equal to >= the median ~ giving you the average of the 'top-half' of your values.
AVERAGEIF(A1:A10,">="&MEDIAN(A1:A10))
Hope this help!

Statistically removing erroneous values

We have a application where users enter prices all day. These prices are recorded in a table with a timestamp and then used for producing charts of how the price has moved... Every now and then the user enters a price wrongly (eg. puts in a zero to many or to few) which somewhat ruins the chart (you get big spikes). We've even put in an extra confirmation dialogue if the price moves by more than 20% but this doesn't stop them entering wrong values...
What statistical method can I use to analyse the values before I chart them to exclude any values that are way different from the rest?
EDIT: To add some meat to the bone. Say the prices are share prices (they are not but they behave in the same way). You could see prices moving significantly up or down during the day. On an average day we record about 150 prices and sometimes one or two are way wrong. Other times they are all good...
Calculate and track the standard deviation for a while. After you have a decent backlog, you can disregard the outliers by seeing how many standard deviations away they are from the mean. Even better, if you've got the time, you could use the info to do some naive Bayesian classification.
That's a great question but may lead to quite a bit of discussion as the answers could be very varied. It depends on
how much effort are you willing to put into this?
could some answers genuinely differ by +/-20% or whatever test you invent? so will there always be need for some human intervention?
and to invent a relevant test I'd need to know far more about the subject matter.
That being said the following are possible alternatives.
A simple test against the previous value (or mean/mode of previous 10 or 20 values) would be straight forward to implement
The next level of complexity would involve some statistical measurement of all values (or previous x values, or values of the last 3 months), a normal or Gaussian distribution would enable you to give each value a degree of certainty as to it being a mistake vs. accurate. This degree of certainty would typically be expressed as a percentage.
See http://en.wikipedia.org/wiki/Normal_distribution and http://en.wikipedia.org/wiki/Gaussian_function there are adequate links from these pages to help in programming these, also depending on the language you're using there are likely to be functions and/or plugins available to help with this
A more advanced method could be to have some sort of learning algorithm that could take other parameters into account (on top of the last x values) a learning algorithm could take the product type or manufacturer into account, for instance. Or even monitor the time of day or the user that has entered the figure. This options seems way over the top for what you need however, it would require a lot of work to code it and also to train the learning algorithm.
I think the second option is the correct one for you. Using standard deviation (a lot of languages contain a function for this) may be a simpler alternative, this is simply a measure of how far the value has deviated from the mean of x previous values, I'd put the standard deviation option somewhere between option 1 and 2
You could measure the standard deviation in your existing population and exclude those that are greater than 1 or 2 standard deviations from the mean?
It's going to depend on what your data looks like to give a more precise answer...
Or graph a moving average of prices instead of the actual prices.
Quoting from here:
Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.
Google is your friend, you know. ;)
For your specific question of plotting, and your specific scenario of an average of 1-2 errors per day out of 150, the simplest thing might be to plot trimmed means, or the range of the middle 95% of values, or something like that. It really depends on what value you want out of the plot.
If you are really concerned with the true max and true of a day's prices, then you have to deal with the outliers as outliers, and properly exclude them, probably using one of the outlier tests previously proposed ( data point is x% more than next point, or the last n points, or more than 5 standard deviations away from the daily mean). Another approach is to view what happens after the outlier. If it is an outlier, then it will have a sharp upturn followed by a sharp downturn.
If however you care about overall trend, plotting daily trimmed mean, median, 5% and 95% percentiles will portray history well.
Choose your display methods and how much outlier detection you need to do based on the analysis question. If you care about medians or percentiles, they're probably irrelevant.

Resources