I have to analyze a delta impact for different categories but we need to focus on only with the highest priority. Example:
Col.A(dollar value)
Col.B (Forecast)
Col.C (Actual)
Col.D (delta)
2000
50
30
20
60
40
10
50
Is sumproduct the only best method? Also there are different attributes by which I need to look at the data
Related
I import data into Excel using Power Query, where I add a new column to calculate a usage percent. I then want to use Power View to visualize this data. However, when applying filters on the data and wanting to view the average usage percent across filters, the results are incorrect as Power View is simply averaging the percentages, rather than calculating the totals of each parameter and applying the formula afterwards. Is there a way to write my formula, so the percentage will be calculated after filters are applied in Power View?
In my example, Usage% = Direct/(Total-Fringe).
Because the denominator is not constant, the usage percent for Group A is not simply an average of the Usage% cells assigned to Group A. The average of Usage% cells in group A is 75%, but the correct usage percent calculated for the Group A using parameter totals is 98/(168-40)= 76.56%. Power View shows the incorrect 75%, as it is simply averaging the cells that correspond to Group A.
I would like to use the filters in Power View to view charts showing usage percent at various levels, including Group and Division, along with other information not shown in the example.
ID Group Division Direct Total Fringe Usage%
1 A AA 40 40 0 1.00
2 A AA 20 40 10 0.67
3 A AB 18 40 15 0.72
4 A AB 20 48 15 0.61
5 B BA 40 40 0 1.00
6 B BA 18 40 12 0.64
7 B BB 12 40 20 0.60
8 B BB 40 48 0 0.83
Create Usage % as a Measure in your Data Model, instead of as a column:
Usage % =
DIVIDE (
SUM ( MyTable[Direct] ),
SUM ( MyTable[Total] ) - SUM ( MyTable[Fringe] ),
BLANK()
)
I am looking for a way to do some type of lookup/match to come up
with bonuses. It is based on what they sold, for how much, and based on how much they sold last year, and the amount increased.
Example Line 2: So say they sold $900 worth of pens to Joe (New Amount is w/in 1-999). Last
year Joe only bought $23 worth of pens (Previous Amount is w/in 1-24) they get a $25 bonus (Bonus). Also..
There also has to be a minimum increase amount (Min increase column). So say if Joe bought
$999 worth of pens last year and $1000 worth of pens this year, the
salesperson shouldn't get a bonus because it was only a buck increase. It has to be at least $50 increase in this case. That's what the Min Increase column is.
Group Min Increase Previous Amount New Amount Bonus
Min Max Min Max
Pens 50 1 24 1000 999999 45
Pens 50 1 24 1 999 25
Pens 50 25 100 1000 999999 45
Pens 5 25 100 1 999 25
Paper 10 1 24 1000 999999 50
Paper 10 1 24 1 999 25
Paper 10 25 100 1000 999999 50
Paper 5 25 100 1 999 25
I started looking at Indexmatching but it's not enough. Then I thought of summing, but it's really not adding anything together.
=SUMIFS(B3:B10, G1:G10, "Pens", D3:D10, "50")
Also, =INDEX(range2,MATCH(TRUE,COUNTIF(range1,range2)>0,0)) won't work because that's only two ranges.
It also has to be something they can update constantly.
The actual data looks something like this
Sale Prev. Group
900 23 Pens
So you'd need to find the difference, see if it's within the minimum increase for the group and the amounts (900-23 > 50). Then return a value. 23 is between 1 and 24 and 900 is between 1 and 999 so $25 bonus.
Does anyone have any suggestions? I'm looking into index matching, but I can't find out how to do it with ranges.
Thanks
The formula is based on aggregate which performs array like operations. As such keep full column references inside of the aggregate function as short as you can and avoid full column references. Based on the layout of information in the picture above, place the following formula in L3 and copy down as required.
=IFERROR(INDEX(G:G,AGGREGATE(15,6,ROW($A$3:$A$10)/(($A$3:$A$10=$K3)*($C$3:$C$10<=$J3)*($D$3:$D$10>=$J3)*($E$3:$E$10<=$I3)*($F$3:$F$10>=$I3)*((I3-J3)>=$B$3:$B$10)),1)),"No Bonus")
Caveat: If for some reason, the sales situation matches multiple rows in your table, then it will return the bonus corresponding to the lowest row number that met all the criteria.
I have a column with ordinal values. I want to have another column that ranks them in equal groups (relatively to their value).
Example: If I have a score and I want to divide to 5 equal groups:
Score
100
90
80
70
60
50
40
30
20
10
What function do I use in the new column to get this eventually:
Score Group
100 5
90 5
80 4
70 4
60 3
50 3
40 2
30 2
20 1
10 1
Thanks! (I'm guessing the solution is somewhere in mod, row and count - but I couldn't find any good solution for this specific problem)
If you don't care about how the groups are split for groups that aren't evenly divisible, you can use this formula and drag down as far as necessary:
= FLOOR(5*(COUNTA(A:A)-COUNTA(INDEX(A:A,1):INDEX(A:A,ROW())))/COUNTA(A:A),1)+1
Possibly a more efficient solution exists, but this is the first way I thought to do it.
Obviously you'll have to change the references to the A column if you want it in a different column.
See below for working example.
I have a large data set with millions of records which is something like
Movie Likes Comments Shares Views
A 100 10 20 30
A 102 11 22 35
A 104 12 25 45
A *103* 13 *24* 50
B 200 10 20 30
B 205 *9* 21 35
B *203* 12 29 42
B 210 13 *23* *39*
Likes, comments etc are rolling totals and they are suppose to increase. If there is drop in any of this for a movie then its a bad data needs to be identified.
I have initial thoughts about groupby movie and then sort within the group. I am using dataframes in spark 1.6 for processing and it does not seem to be achievable as there is no sorting within the grouped data in dataframe.
Buidling something for outlier detection can be another approach but because of time constraint I have not explored it yet.
Is there anyway I can achieve this ?
Thanks !!
You can use the lag window function to bring the previous values into scope:
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy('Movie).orderBy('maybesometemporalfield)
dataset.withColumn("lag_likes", lag('Likes, 1) over windowSpec)
.withColumn("lag_comments", lag('Comments, 1) over windowSpec)
.show
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-functions.html#lag
Another approach would be to assign a row number (if there isn't one already), lag that column, then join the row to it's previous row, to allow you to do the comparison.
HTH
My business is in the wine reselling business, and we have this problem I've been trying to solve. We have 50 - 70 types of wine to be stored at any time, and around 500 tanks of various capacity. Each tank can only hold 1 type of wine. My job is to determine the minimum number of tanks to hold the maximum number of type of wines, each filled as close to its maximum capacity as possible, i.e 100l of wine should not be stored in a 200l tank if 2 tanks of 60l and 40l also exist.
I've been doing the job by hand in excel and want to try to automate the process, but using macros and array formulas quickly get out of hand. I can write a simple program in C and Swift, but stuck at finding a general algorithm. And pointer on where I can start is much appreciated. A full solution and I will send you a bottle ;)
Edit: for clarification, I do know how many types of wine I have and their total quantity, i.e Pinot at 700l, Merlot 2000l, etc. These change every week. The tanks however have many different capacities (40, 60, 80, 100, 200 liters etc) and change at irregular interval since they have to be taken out for cleaning and replaced. Simply using 70 tanks to hold 70 types is not possible.
Also, total quantity of wine never matches total tanks' capacity, and I need to use the minimum number of tanks to hold the maximum amount of wine. In case of insufficient capacity the amount of wine left over must be smallest possible (they'll spoil quickly). If there is left-over, the amount left over of each type must be proportional to their quantity.
A simplified example of the problem is this:
Wine:
----------
Merlot 100
Pinot 120
Tocai 230
Chardonay 400
Total: 850L
Tanks:
----------
T1 10
T2 20
T3 60
T4 150
T5 80
T6 80
T7 90
T8 80
T9 50
T10 110
T11 50
T12 50
Total: 830L
This greedy-DP algorithm attempts to perform a proportional split: for example, if you have 700l Pinot, 2000l Merlot and tank capacities 40, 60, 80, 100, 200, that means a total capacity of 480.
700 / (700 + 2000) = 0.26
2000 / (700 + 2000) = 0.74
0.26 * 480 = 125
0.74 * 480 = 355
So we will attempt to store 125l of the Pinot and 355l of the Merlot, to make the storage proportional to the amounts we have.
Obviously this isn't fully possible, because you cannot mix wines, but we should be able to get close enough.
To store the Pinot, the closest would be to use tanks 1 (40l) and 3 (80l), then use the rest for the Merlot.
This can be implemented as a subset sum problem:
d[i] = true if we can make sum i and false otherwise
d[0] = true, false otherwise
sum_of_tanks = 0
for each tank i:
sum_of_tanks += tank_capacities[i]
for s = sum_of_tanks down to tank_capacities[i]
d[s] = d[s] OR d[s - tank_capacities[i]]
Compute the proportions then run this for each type of wine you have (removing the tanks already chosen, which you can find by using the d array, I can detail if you want). Look around d[computed_proportion] to find the closest sum possible to achieve for each wine type.
This should be fast enough for a few hundred tanks, which I'm guessing don't have capacities larger than a few thousands.