I am building a data model within Power Pivot for Excel 2013 and need to be able to identify the max value within a column for a particular group. Unfortunately what I thought would work and what I have searched for previously gave me an error or wasn't applicable (there was a similar question that dealt with calculated measures rather than columns and wasn't replicable in Power Pivot data view to the best of my knowledge)
I have included an indication of what I am trying to achieve below, in this case I am trying to calculate the Max % uptake column.
Group | % uptake | Max % uptake
A 40 45
A 22 45
A 45 45
B 12 33
B 18 33
B 33 33
C 3 16
C 16 16
C 9 16
Many thanks
Use
=CALCULATE(MAX([UPTAKE]),FILTER(Table1,[GROUP]=EARLIER([GROUP])))
use this formula in cell ("C2"):
=MAX(INDIRECT(CONCATENATE("B",MATCH(A2,$A$1:$A$10,0),":B",SUMPRODUCT(MAX(($A$1:$A$10=A2)*(ROW($A$1:$A$10)))))))
Related
I have a large data set with millions of records which is something like
Movie Likes Comments Shares Views
A 100 10 20 30
A 102 11 22 35
A 104 12 25 45
A *103* 13 *24* 50
B 200 10 20 30
B 205 *9* 21 35
B *203* 12 29 42
B 210 13 *23* *39*
Likes, comments etc are rolling totals and they are suppose to increase. If there is drop in any of this for a movie then its a bad data needs to be identified.
I have initial thoughts about groupby movie and then sort within the group. I am using dataframes in spark 1.6 for processing and it does not seem to be achievable as there is no sorting within the grouped data in dataframe.
Buidling something for outlier detection can be another approach but because of time constraint I have not explored it yet.
Is there anyway I can achieve this ?
Thanks !!
You can use the lag window function to bring the previous values into scope:
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy('Movie).orderBy('maybesometemporalfield)
dataset.withColumn("lag_likes", lag('Likes, 1) over windowSpec)
.withColumn("lag_comments", lag('Comments, 1) over windowSpec)
.show
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-functions.html#lag
Another approach would be to assign a row number (if there isn't one already), lag that column, then join the row to it's previous row, to allow you to do the comparison.
HTH
Trying to use the solver add-in for Excel to get the min average for a range of cells in the following table:
Where PT = processing time, DD = Due date
The cell Tardiness has the following formula; =IF([#[Cum. PT]]-[#DD]>0;[#[Cum. PT]]-[#DD];0) --> Basically Time past the due date, but if below due date = 0
Cumulative cells have the obvious formulas.
The cell in the bottom right is the average, this is the value I want to optimize, this should be done by switching the order of rows.
Is it possible to do this in solver? That it just switches order of entire rows?
As your problem is stated, it can be solved with the following steps ...
Put Row, Real PT, and DD in a separate source table (SourceTable).
Like this.
Row Real PT DD
1 8 30
2 10 14
3 13 68
4 18 53
5 16 58
6 12 18
7 11 78
8 14 26
Build your table where Real PT is calculated as =VLOOKUP(A2,SourceTable,2), and DD is calculated as =VLOOKUP(A2,SourceTable,3). All other calculations should be as you have already defined.
Set up Solver with the following:
Objective Cell is $F$10
Minimize
By Changing $A$2:$A$9
Constraints $A$2:$A$9 <= 8, $A$2:$A$9 >= 1, $A$2:$A$9 = AllDifferent, $A$2:$A$9 = integer
Solving method = Evolutionary
With these, I get the following result ...
... which may or may not be acceptable - it seems the 84 day tardiness may be excessive. Constraints could be added for maximum tardiness.
Edit - You can also specify GRG Nonlinear as the solution method, but you will need to go into options and pick multistart. GRG Nonlinear takes much longer to come to a solution than Evolutionary.
Solver can not change positions of rows.
For your case I would just sort Tardiness from smallest to largest which would give you the smallest average cumulative tardiness.
If I have a pivot table and I set the row to be a running total according to date and right now i will like to use this row to create a calculated field. Is it possible?
If not then is there a formula for cumulative calculations for calculated field?
Will supply more examples if need more clarification.
I want to do something like this
week 1 2 3 4 5 6 7 8 9 10
count 20 20 21 25 26 27 28 29 21 21
cumulative count 20 40 61 86 112 139 167 196 217 238
If the week is the base field then can I create a calculated field that does something like the cumulative count? I am doing this as i need to use the cumulative count for further calculations and if i use the show values as running total it seems to me that I cant use that variable for further calculations.
Hope this helps to clarify.
There are time intelligence functions built into DAX. You could use TOTALYTD(), TOTALQTD(), and TOTALMTD() if you have a proper date dimension with contiguous, non-repeating dates ranging from January 1 in the first year you have data through December 31 in the last year you have data.
If you have a non-standard fiscal calendar you can get the same effect so long as you have index fields for each time granularity of interest which are increasing over time.
CustomTotalYTD:=
CALCULATE(
[<some measure>]
,FILTER(
ALL( 'DimDate' )
,'DimDate'[FiscalYear] = MAX( 'DimDate'[FiscalYear] )
&& 'DimDate'[Date] <= MAX( 'DimDate'[Date] )
)
)
I am looking for a formula that will return the earliest date from a column, based on the contents of values in other cells. (Actually I want a Min and Max date, but am assuming the Max will be identical to any Min solution )I know I can return the date I want just by using MIN and specifying the range of cells I want, but I ideally want the formula to be dynamic. I have looked around and believe I possibly need to use a combination of index and match, but cant find any examples that use Min and Max. I have considered using dynamic named ranges to define my task groups, but that would mean having to define a static number of task groups, and there could be many task groups.
The sheet below shos some sample date on the left of the workbook, with the summary data on the right. The "hidden worker column" was an idea I had that I though might make the solution easier. So I want the summary data on the right to use either column A, or column B if its easier, to display the min and max dates based on the section number in column F - Is this possible without VBA?
#mthierer's link is good. If you wanted to remove the need to add a "helper column", you could try (data in A1:C10; summary table in E1:G2):
{=MIN(IF(ROUNDDOWN($A$1:$A$10, 0)=$E1, $B$1:$B$10))} (or {=MAX(...)} with $C$1:$C$10)
Note that you have to enter the formula as an array formula with CtrlShiftEnter.
Data (A1:C10):
1 23 57
1.1 42 91
1.2 35 100
1.3 39 80
1.4 28 51
1.5 30 96
2 33 52
2.1 11 73
2.2 48 80
2.3 16 59
Summary Results (E1:G2):
1 23 100
2 11 80
I have a table that I want to find the percentage greater than and percentage less than compared to a baseline, for the total group based on the weights of each group.
Here is my example table:
Benchmark GRP 1 GRP 2 GRP 3 GRP 4
10 10 11 10 12
14 12 15 11 15
17 11 17 13 16
18 14 15 14 17
Poulation 40 45 30 80
What I want to do is find out for each level of the benchmark what % of the total population of all four groups is above or below the bench mark value.
I have tried various sumproducts and sumifs but can't seem to get it work.
Let me know your thoughts!
Thanks as always!
Assuming your sample data is in A1:E7 put the following formula into B9 and use Ctrl+Shift+Enter to record it as an array formula:
=SUM(IF(B$2:B$5>$A$2:$A$5,1,0))/COUNTA($A$2:$A$5)
This can then be copied across under the other groups. Below is showing how it works for me.
Note: The array formula will display with braces ({...}) around it but you do not type these.