How to compute median value of sales figures? - median

In MySQL while there is an AVG function, there is no Median. So I need to create a way to compute Median value for sales figures.
I understand that Median is the middle value. However, it isn't clear to me how you handle a list that isn't an odd numbered list. How do you determine which value to select as the Median, or is further computation needed to determine this? Thanks!

I'm a fan of including an explicit ORDER BY statement:
SELECT t1.val as median_val
FROM (
SELECT #rownum:=#rownum+1 as row_number, d.val
FROM data d, (SELECT #rownum:=0) r
WHERE 1
-- put some where clause here
ORDER BY d.val
) as t1,
(
SELECT count(*) as total_rows
FROM data d
WHERE 1
-- put same where clause here
) as t2
WHERE 1
AND t1.row_number=floor(total_rows/2)+1;

Related

DAX Formula for - Closing Balance based on last non-blank by non-date column

I'm trying to show the total closing balance by month for the dataset below:
[Tranche] [Maturity Date] [Balance]
T1 1-Jan-16 1000
T2 2-Jan-16 200
T3 1-Jan-16 3000
T3 3-Jan-16 2900
T1 31-Jan-16 1000
T2 1-Feb-16 200
T3 31-Jan-16 3000
T3 2-Feb-16 2900
I have joined the dataset (table LoanSched) with a dates lookup table (Dates).
Here's the DAX calculated field formula:
=CALCULATE (
SUM(LoanSched[Balance]),
FILTER ( Dates, Dates[FullDate] = MAX(LoanSched[Maturity Date]) )
)
However, I get the result below which is incorrect. Since Tranche T2's balance ends on a date earlier than T3, the balance is excluded in the monthly total. The way the dataset works, is that the total balance should included balances that appear on the last day of each month and tranche. I'm missing the tranche condition.
I need to calculate the correct balances (highlighted in yellow) below:
So what you have here is a form of a semi-additive measure, though I don't quite understand that grand total as it relates to the subtotals - what it says to me is that each "tranche-maturity date" combination is an independent instrument, so it doesn't entirely make sense to use traditional time intelligence - like instead of months that could just be some other arbitrary hierarchy. Is that correct?
Anyway, based on your criteria, what you want is basically
a calculated measure that returns the last non blank balance within a month for a given tranche;
another measure which adds up that measure for each tranche to get a "maturity month balance";
and then a final measure that adds up that measure for each maturing month to get a "total balance".
For #1, this is the traditional formula:
TrancheEndingBalance := CALCULATE (
SUM ( ClosingBalance[Balance]),
LASTNONBLANK (
Dates[FullDate],
CALCULATE ( SUM ( ClosingBalance[Balance] ) )
)
)
And then #2 is just a SUMX across tranches:
MaturityMonthEndingBalance :=
SUMX ( VALUES ( ClosingBalance[Tranche] ), [TrancheEndingBalance] )
And #3 a SUMX across maturity months:
TotalEndingBalance :=
SUMX ( VALUES ( Dates[MonthYear] ), [MaturityMonthEndingBalance] )
Please note these measures essentially only work for the layout you've described, but it sounds like that's the only way to get at the correct balance for a given set of tranches and maturity dates, so form follows function, as it were.

Calculating median values in HIVE

I have the following table t1:
key value
1 38.76
1 41.19
1 42.22
2 29.35182
2 28.32192
3 33.66
3 33.47
3 33.35
3 33.47
3 33.11
3 32.98
3 32.5
I want to compute the median for each key group. According to the documentation, the percentile_approx function should work for this. The median values for each group are:
1 41.19
2 28.83
3 33.35
However, the percentile_approx function returns these:
1 39.974999999999994
2 28.32192
3 33.23.0000000000004
Which clearly are not the median values.
This was the query I ran:
select key, percentile_approx(value, 0.5, 10000) as median
from t1
group by key
It seems to be not taking into account one value per group, resulting in a wrong median. Ordering does not affect the result. Any ideas?
In Hive, median cannot be calculated directly by using available built-in functions. Below query is used to find the median.
set hive.exec.parallel=true;
select temp1.key,temp2.value
from
(
select key,cast(sum(rank)/count(key) as int) as final_rank
from
(
select key,value,
row_number() over (partition by key order by value) as rank
from t1
) temp
group by key )temp1
inner join
( select key,value,row_number() over (partition by key order by value) as rank
from t1 )temp2
on
temp1.key=temp2.key and
temp1.final_rank=temp3.rank;
Above query finds the row_number for each key by ordering the values for the key. Finally it will take the middle row_number of each key which gives the median value. Also I have added one more parameter "hive.exec.parallel=true;" which enables to run the independent tasks in parallel.

Calculated Measure Based on Condition in Dax

I have a requirement in Power Pivot where I need to show value based on the Dimension Column value.
If value is Selling Price then Amount Value of Selling Price from Table1 should display, if Cost Price then Cost Price Amount Should display, if it is Profit the ((SellingPrice-CostPrice)/SellingPrice) should display
My Table Structure is
Table1:-
Table2:-
Required Output:-
If tried the below option:-
1. Calculated Measure:=If(Table[Category]="CostPrice",[CostValue],If(Table1[category]="SellingPrice",[SalesValue],([SalesValue]-[CostValue]/[SalesValue])))
*[CostValue]:=Calculate(Sum(Table1[Amount]),Table1[Category]="CostPrice")
*[Sales Value]:=Calculate(Sum(Table1[Amount]),Table1[Category]="SellingPrice")
Tried this in both Calculated Column and Measure but not giving me required output.
Cost:=
CALCULATE(
SUM( Table1[Amount] )
,Table1[Category] = "CostPrice"
)
Selling:=
CALCULATE(
SUM( Table1[Amount] )
,Table1[Category] = "SellingPrice"
)
Profit:=
DIVIDE(
[Selling] - [Cost]
,[Selling]
)
ConditionalMeasure:=
IF(
HASONEFILTER( Table2[Category] )
,SWITCH(
VALUES( Table2[Category] )
,"CostPrice"
,[Cost]
,"SellingPrice"
,[Selling]
,"Profit"
,[Profit]
)
,[Profit]
)
HASONEFILTER() checks that there is filter context on the named field and that the filter context includes only a single distinct value.
This is just a guard to allow our SWITCH() to refer to VALUES( Table2[Category] ). VALUES() returns a table of all distinct values in the named column or table. So, a 1x1 table can be implicitly converted to a scalar, which we need in SWITCH().
SWITCH() is a case statement.
Our else condition in the IF() is just returning [Profit]. You might want something else, but it's unclear what should happen at the grand total level. You can leave this off, and the measure will be blank in IF()'s else condition.
I was thinking about this a little. I'm not sure why you have your categories on rows. Usually the data set would have columns like: item | CostPrice | SellingPrice | Profit. Then you can just use the columns to define your fields. The model becomes easier and more maintainable.

How to have a measure lookup a value based on the row and column context?

I need to Aggregate a number of multiplications which are based on the Row and Columns context. My best attempt at describing this is in pseudo-code.
For each cell in the Pivot table
SUM
Foreach ORU
Percent= Look up the multiplier for that ORU associated with the Column
SUMofValue = Add all of the Values associated with that Column/Row combination
Multiply Percent * SUMofValue
I tried a number of ways over the last few days and looked at loads of examples but am missing something.
Specifically, What won't work is:
CALCULATE(SUM(ORUBUMGR[Percent]), ORUMAP)*CALCULATE(SUM(Charges[Value]), ORUMAP)
because you're doing a sum of all the Percentages instead of the sum of the Percentages which are only associated with MGR (i.e., the column context)
Link to XLS
One way of doing that is by using nested SUMX. Add this measure to ORUBUMGR:
ValuexPercent :=
SUMX (
ORUBUMGR,
[PERCENT]
* (
SUMX (
FILTER ( CHARGES, CHARGES[ORU] = ORUBUMGR[ORU] ),
[Value]
)
)
)
For each row in ORUBUMGR you will multiply percent by ....
the sum of value for each row in Charges where ORUBUMGR ORU is the same as Charges ORU. Then you sum that product.

Get the average value between a specified set of rows

Everyone, I'm building a report using Visual Studio 2012
I want to be able to average a group of values between a specific set of rows.
What I have so far is something like this.
=(Count(Fields!SomeField.Value))*.1
and
=(Count(Fields!SomeField.Value))*.9
I want to use those two values to get the Average of Fields!SomeField.Value between those to numbers of the row. Basically I'm removing the top and bottom 10% of the data and get the middle 80% to average out. Maybe there is a better way to do this? Thanks for any help.
Handle it in SQL itself.
Method 1:
Use NTILE function. Go through this link to learn more about NTILE.
Try something like this
WITH someCTE AS (
SELECT SomeField, NTILE(10) OVER (ORDER BY SomeField) as percentile
FROM someTable)
SELECT AVG(SomeField) as myAverage
FROM someCTE WHERE percentile BETWEEN 2 and 9
if your dataset is bigger
WITH someCTE AS (
SELECT SomeField, NTILE(100) OVER (ORDER BY SomeField) as percentile
FROM someTable)
SELECT AVG(SomeField) as myAverage
FROM someCTE WHERE percentile BETWEEN 20 and 90
METHOD 2:
SELECT Avg(SomeField) myAvg
From someTable
Where SomeField NOT IN
(
SELECT Top 10 percent someField From someTable order by someField ASC
UNION ALL
SELECT Top 10 percent someField From someTable order by someField DESC
)
Note:
Test for boundary conditions to make sure you are getiing what you need. If needed tweak above sql code.
For NTILE: Make sure you NTILE parameter is less(or equal) than the number of rows in the table.

Resources