I've been attempting to program a PowerPivot Workbook that I've been using to calculate a weighted standard deviation.
The problem is that when I use the code:
(the quality metric Q is weighted by the Product Tons for each record to get weighted statistics for variable periods [ie weeks, months, years])
Product Q-St.d:=SQRT((SUMX('Table',((([PRODUCT_Q]-[W_Avg_Q]))^2)*[TOTAL_PRODUCT_TONS]))/(((COUNTX('Table',[Production_Q])-1)*[Product Tons])/COUNTX('Table',[Production_Q])))
It calculates the [W_Avg_Q], which is the weighted average for Q, for each row as it iterates through instead of getting a weighted average for the whole context. I've learned pretty much all my DAX on the job or this site so I'm hoping there's some command to get the weighted average to calculate first. Does anyone know such a command? or another method of getting a weighted standard deviation out of DAX?
I think what you want to do is to declare [W_Avg_Q] a variable and then use it in your formula.
Product Q-St.d :=
VAR WtdAvg = [W_Avg_Q]
RETURN SQRT((SUMX('Table',((([PRODUCT_Q]-WtdAvg))^2)*[TOTAL_PRODUCT_TONS])) /
(((COUNTX('Table',[Production_Q])-1)*[Product Tons])/COUNTX('Table',[Production_Q])))
This way it gets calculated once in the proper context and then stored and reused within the formula.
Related
Im wondering how I would calculate the Sum of a Cumulative Probability in Excel?
I have attached the column of values that I am working with. Any help is appreciated
I have tried finding the mean/average of the values and then std deviation, then using the norm distribution function and then sum those values but it doesn't seem to be creating the right value.
You can use NORM.DIST(x,mean,standard_dev,cumulative) which allows you to specify the mean and the standard deviation. If the last argument is TRUE it returns the cumulative probability. Obviously, under the assumption, the distribution of your data corresponds to the Normal Distribution. If you are not sure about that, then you need to run a normality test that will confirm that first (anyway most natural phenomenons are distributed as Normal).
For the mean, you can use the AVERAGE function, and for the Standard Deviation STDEV.S.
So on cell D4 put the following formula to calculate the cumulative probability for 0.25:
=NORM.DIST(D3,D1,D2, TRUE)
So if your data correspond to a Normal Distribution, then the cumulative probability for 0.25 will be 0.361494.
Overall goal for my report:
I am creating a pivot table in excel right now (eventually in Power Bi) that will update daily through data imports to reflect weekly changes in sales. I am then trying to perform a Z score analysis on each week to see if there are any outliers within the data.
What I will need to do is be able to subtract a mean of all of the data from each weekly set of sales, then divide it by the standard deviation.
Current thought process for data:
If I can get the grand total at the bottom, could I get that as a value entered for each row in another column? Can I do it as a total average and a total standard deviation? I can do it outside of a pivot table, but I want something in a pivot table so it auto-populates.
Current Data
Desired Data
You can tackle this in at least two approaches:
Dynamic calculation using measures
Back-end calculation
The first approach consists of defining measures in the following context:
CALCULATE([MEASURES], All('Calendar'), VALUES('Calendar'[Year]), VALUES('Calendar'[Month]))
This allows you to calculate a measure in a context that consider the entire month. Therefore, for each day you would have a measure that gives you the stdev of the entire month.
Pro: dynamic; fast to implement; can be based on measures already defined
Cons: more calculation in front-end slows down your report
The second approach consists of pre-calculating this values in the back-end. Here you have two possible approaches:
Data source: add these new columns in the data source (e.g. Database)
Pro: best-practices and clean approach
Cons: static; cannot use measures already defined
Calculated Column in DAX: define the value as a Calculated Column in the back-end of Power BI using the same structured defined for the Measure:
CALCULATE([MEASURES], All('Calendar'), VALUES('Calendar'[Year]), VALUES('Calendar'[Month]))
Pro: fast to implement
Cons: static; really against best-practices
In Power BI I used following measures (replace 'stack' with 'your table name')
Total StdDev = CALCULATE(STDEV.P(stack[sum of sales]), ALL(stack))
TotalMean = CALCULATE(AVERAGE(stack[sum of sales]),ALL(stack))
Z score = (SUM(stack[sum of sales]) - [TotalMean])/[Total StdDev]
I used average to calculate MEAN and I get different result to yours (please see below).
If you can share formula that you used to calculate 'TotalMean' maybe I can update it.
This may be more of a statistics question, and I'd like to find a solution with Excel. I'd rather use simple VBA if any coding is necessary.
Is there a way to estimate the percentile of a specific data point in a skewed distribution? I don't need exact percentiles and only need a reasonable estimate. I work on analyses that rely on weighted average benchmarks reported by multiple sources. All of my sources report the 25th, 50th, 75th, and 90th percentiles as well as the mean and standard deviation. We use these benchmarks to set a target range, and our goal is for our results from a specific analysis to land somewhere within the published percentiles. I'm often asked to indicate what percentile our specific result is at, and all I can provide is broad ranges like 25th-50th, etc. So, I'm then asked to use simple extrapolation to determine the specific percentile of the specific result, and I know that using this method is inaccurate.
Mean and median differ in 99% of cases in my data set, but % difference between mean and median on average is only 6%. Only about 10% of cases have mean and median with greater than 10% difference.
For the 90% of cases with relatively low % difference between mean and median, can I assume the normal distribution?
For cases with higher % difference between mean and median, can I make an assumption that will help me estimate more accurately? I could for these cases just use the normal distribution and send my percentile estimate along with a note indicating that the estimate is likely off in one direction or another, but I'd rather give a better estimate.
Responding to cybernetic.nomad:
First, thanks for commenting! Second, it doesn't seem to work. I think I don't have enough data. The attached image shows an example. The first 5 rows show one set of my weighted average benchmarks for a single case. Below that, I added two lines--one with my "target" amount. This could be any number but, to test out the formula you suggested, I entered my 50th percentile weighted average. The row below that has the results of the formula =percentrank.exc(25th:90th,target). The result should be 0.5 but it's not, so I don't think this works. example
together
somehow my pivot table is currently crashing. The table is structured as follows:
Area, Code and QTY1 are defined by the data model. QTY2, Min and Performance are measures.
QTY2 sums all codes for defined filters from another table. Min calculates the lower value of QTY1 and QTY2.
Measure Performance calculates the ratio of Min to QTY1.
Unfortunately, the sum of Min from Excel will be incorrect. Also, the mean or the sum of performance is also wrong.
Formumlar QTY2: =CALCULATE(SUM(tbl2[QTY]);FILTER(tbl2; tbl2[TYP]<>"11"))
Formular MIN: =MIN([QTY2];[QTY1])
Formular Performance : = [MIN]/[QTY1]
What exactly am I doing wrong?
How can the mistake be avoided?
Edit:
The following approach sums up the right volume for [Min].
But it is not showing the accurate average of 37%. It sums up the divided value.
[Performance]=SUMX(tbl_General;DIVIDE([QTY2];[QTY1];BLANK()))
[Min]=SUMX(tbl_General;(MIN([Qty1];[Qty2])))
Why is that so?
Best regards
Joshua
So this is an example of where SUMX is needed.
You've stumbled on the difference between the aggregation of an expression and the sum of values.
Something like SUMX(dim_Tbl, DIVIDE([MIN], [QTY1], BLANK())) should work
EDIT:
After seeing the edit on the OP, the following measures should work.
Min = SUMX(tbl_General;(MIN([Qty1];[Qty2])))
and
Performance = DIVIDE([QTY2];[Min];BLANK())
In general, 'X' measures are used to iterate over a table and sum the table whereas 'normal' measures are used for recalculations in sums. Your Performance measure you want to recalculate for the total, so don't use SUMX, your Min Measure, you want as a sum of the previous values, so do use a SUMX.
I'm moving from an old Excel based reporting system to PowerBI and I'm finding some discrepancies.
When I take the same listed percentages in PowerBI and calculate the average in excel, it's different (See Below):
I have no idea what's going on here... I thought maybe it was a rounding issue, but it's just not making sense to me.
When I export the PowerBI data from the table and then average it in excel it's a different number.
That shouldn't happen, right?
Am I going crazy here?
When I calculate it manually I get 99.828% which should round to 99.83% as Excel shows.
It seems to me that the PowerBI average is simply incorrect.
Edit:
After applying RADO's answer, here are my results (I dropped the Round and it seems to work - I think maybe it's an issue with my data - not his methodology):
There is a critical difference between how Excel and DAX calculate averages.
Excel takes average of the rounded numbers in each row.
DAX (Power BI) calculates averages independently in each cell. Meaning that cell "total" is calculated not as average of rounded scores, but as average of non-rounded underlying values of the entire data set, which is then rounded. This is how DAX operates conceptually - each calculation is always done independently of other calculations in the table.
The way to fix it:
In Power BI, rewrite you DAX formula to use AVERAGEX instead of AVERAGE. For example:
Correctly Averaged Scores =
AVERAGEX(
VALUES(TableName[Submitter]),
ROUND(CALCULATE(AVERAGE(TableName[OrbScore])),2)
)
Here, we first create a list of distinct "Submitters". Then we iterate over the list, and for each submitter calculate its average and round it to 2 digits. Finally, we calculate the average of the rounded averages, essentially replicating the behaviour of Excel.