I am not looking for any code or formula but a rationale/logic.
Background: My data set comes in Date/Time format where a new timestamp is created for each new occurrence of an event.
My goal is to calculate number of occurrences within each hour for a given day. Unfortunately, system does not capture number if occurrences per period as integers. So I have count the number of time an hour value appears within the hour i.e number of times 4 o'clock hour appears. I am currently using Pivot Table in Excel to count the number of times each hour appears. Fields in Rows are hour and dates, and field in Values is count of hour.
Trouble is that I cannot use any summarize functions to get stuff like sum, min, max, percentile, and standard deviation. For example, changing count to sum will only add up all hours. So sum of 4 o'clock hour will return 12 instead of 3. So I am having to use array formulas on pivot table to give me max and min etc.
If I was to use this data in data viz tools like Tableau or Power BI. I won't be able to get very far. I am looking for a suggestions/workaround that can allow me to manipulate my data in a way so it can be used in Pivot Tables in Excel and in data viz tools.
I know my questions is not specific to one tool but I am looking to enhance me understanding of data and data manipulations techniques.
EDIT: Please see attached image
Build a data model, using PowerPivot. Join your fact table to a calendar dimension table. Create a row count measure - you can then summarise that measure to suit (sum, average, min, etc)
Related
I have data in below format. It shows starting and end time of an activity and calculates duration accordingly. The activity is performed through out the day at different times.
I have added a pivot. I want to find out the average duration in a workday or a holiday(Day category). When I am trying to apply average in the current pivot, it is dividing the total duration by the number of sessions in a day.For example in week 1, an activity was done on 4 work days and the total duration for the activity in workdays was 04.19, I want to divide this number by 4 and find out the average time spent on each day but the pivot divides it by 11 which is the total number of sessions in the four days.
Link for data
Steps:
Add a helper column to identify how many unique pairs of Dates/Day Categories there are:
=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)
You can add extra products to this formula to force extra fields to be unique to be counted as well.
SRC:Simple Pivot Table to Count Unique Values
Add a Calculated Field in the PivotTable that is:
SUM(Duration)/SUM([Helper Column Name]) and include it in the 'Values' section of the PivotTable. Due to the new column being added, you might have to re-create the PivotTable.
This should produce the average in the manner that you want.
In PowerBI, I need to create a Performance Indicator (KPI) measure which evaluates dataset values in a scale from 0 to 1, with target (1) being the MAX value in a 20 years history. It's a national airport trip records open database. The formula is basically [value]/[max value].
My dataset has a lot of fields and I wish I could filter it by any of these fields, with a line chart showing the 0-1 indicator for each month based on the filters.
This is my workaround test solution:
Table 1 - Original dataset: if I filter something here, below tables also update (there are more fields to the left, including YEAR and MONTH
Table 2 - Reference to original dataset, aggregating YEAR-MONTH by the sum of "take-offs" (decolagens)
Table 3 - Reference to above (sum) table, aggregating MONTH by the max of "take-offs" (decolagens)
Table 4 - 'Sum table' merged to 'Max table' by MONTH as new table: then do [Value]/[Max] and we've got the indicator
So if i filter the original dataset by any fields, all other tables update accordingly and the indicators always stays between 0-1, works like a charm.
TL;DR
The problem is: I need to create a dashboard of this on Power Bi. So I need this calculation to be in a measure or another workaround.
My possible solution: by pure DAX code in the measure field, to produce Tables 2 and 3 so I'll divide the month sum values by their month max value (which will both be produced according to PowerBi dashboard slicers) and get the indicator dinamically produced.
I'm stuck at: I don't understand how can I reference a sum/max aggregate table in dax code. Something like = SUM (dataset[take-offs]) / MAX (SUM (dataset[take-offs])). Of course these functions do not work like that, but I hope I made my point clear: how can I produce this four table effect with a single measure?
Other solutions are welcome.
Link to the original dataset: https://www.anac.gov.br/assuntos/dados-e-estatisticas/dados-estatisticos/arquivos/DadosEstatsticos.csv
It's an open dataset, so I guess there's no problem sharing it. Please help! :)
EDIT: please download the dataset and try to solve this. Personally I think it's a quality statistics doubt that will eventually help others. The calculation works, it only needs a Power Bi Measure port.
Add the ALL formula:
Measure = SUMX(ALL('Table'),[Valor])/SUM('Table'[Max])
Example
I'm looking a way to add an extra column in a pivot table that that averages the sum of the count for the months ("Count of records" column) within a time period that is selected (currently 2016 - one month, 2017 - full year, 2018 - 5 month). Every month would have the same number based on the year average, needs to be dynamically changing when selecting different period: full year or for example 4 months. I need the column within the pivot table, so it could be used for a future pivot chart.
I can't simply use average as all my records appear only once and I use Count to aggregate those numbers ("Count of records" column).
My current data looks like this:
The final result should look like this:
I assume that it somehow can be done with the help of "calculated filed" option but I couldn't make it work now.
Greatly appreciate any help!
Using the DataModel (built in to Excel 2013 and later) you can write really cool formulas inside PivotTables called Measures that can do this kind of thing. Take the example below:
As you can see, the Cust Count & Average field gives a count of transactions by month but also gives the average of those monthly readings for the subtotal lines (i.e. the 2017 Total and 2018 Total lines) using the below DAX formula:
=AVERAGEX(SUMMARIZE(Table1,[Customer (Month)],"x",COUNTA(Table1[Customer])),[x])
That just says "Summarize this table by count of the customer field by month, call the resulting summarization field 'x', and then give me the average of that field x".
Because DAX measures are executed within the context of the PivotTable, you get the count that you want for months, and you get the average that you want for the yearly subtotals.
Hard to explain, but demonstrates that DAX can certainly do this for you.
See my answer at the following link for an example of how to add data to the DataModel and how to subsequently write measures:
Using the Excel SMALL function with filtering criteria AND ignoring zeros
I also recommend grabbing yourself a book called Supercharge Excel when you learn to write DAX by Matt Allington, and perhaps even taking his awesome online course, because it covers this kind of thing very well, and will save you significant head-scratching compared to going it alone.
I have about 160k rows of data which consists of temperatures and flow rates. I need to get it down to being a bit more manageable. I would like to have an average temperature and flow rate each hour instead of several values each minute. I'm not much of a programmer but I don't think this would be a problem if it were the case that I had the same amount of measurements each hour, which is not the case. Each hour has between 200-400 data points and I'm not sure how I would go about this problem. The output I want is something like
date - hr - average1 - average 2 ... and so on.
How the sheet looks
Does anyone have a suggestion of how to go about this?
Sincerely,
This is how it looks in a pivot table. I've put in some dummy data which has two different days and two different times to test it.
To get the date without the time I've used
=floor(B2,1)
because date-times are stored as decimal numbers where the whole number part contains the date and the fraction part contains the time.
To get the hour I've just used
=hour(B2)
My test data looks like this
and the resulting pivot table looks like this
I had to use Value Field Settings and Number Format to tidy up the formatting of the date and average columns.
This sheet contains a call log where there are a variable amount of calls per day, with a variable amount of call durations. My goal is to create a chart that shows the number of calls as each day passes, but also show the average call duration per day.
I am getting stuck on the pivot table where I want to gather a column of Average of Call Duration. Problem is I can't figure out a way for to tell Excel to calculate the average on the amount of calls per day. The current table shows Call Duration as hh:mm:ss, and I want to show the Average of Call Duration as mm:ss, especially in the chart.
Attaching same picture of values and how my pivot table currently looks.
Any help would be greatly appreciated.
To summarize, I need to figure out
Tell Excel that the average is based on hh:mm:ss so it doesn't cause
issues (if there's that chance) Tell Excel to calculate averages per
day
Thanks!
(Using Excel for Mac 2011)
http://s22.postimg.org/w1op2ue0h/Pivot.png (values of the table)
http://s22.postimg.org/bt1bh4epd/Values.png (pivot from those values where you see avg duration 1442.00)
In the pivot table, put the date into the row labels box and the duration into the ∑ Values box.
Click on the down arrow where it says Sum of Duration and select Value Field Settings.
Change Sum to Average under Summarize Field By.
Click the Number Format button and choose a Time format e.g. hh:mm:ss (or you can choose mm:ss from the Custom option).
The average duration for each day should be shown in hh:mm:ss or mm:ss format.