I am measuring room utilization (time used/time available) from a data dump. Each row contains the available time for the day and the time used for a particular case.
The image is a simplified version of the data.
If you read the yellow and green highlights (Room 1):
In room 1, there are 200 available minutes on 1/1/2016.
Case 1 took 60 minutes, case 2 took 50 minutes.
There are 500 available minutes on 1/2/2016, and only one case occurred that day, using 350 minutes.
Room 1 utilization = (60 + 50 + 350)/(200 + 500)
The problem with summing the available time is that it double counts the 200 minutes for 1/1/2016, giving: Utilization = (60+50+350)/(200+200+500)
There are hundreds of rows in this data (and there will be multiple data dumps of differing #'s of rows) with multiple cases occurring each day.
I am trying to use a pivot table, but I cannot obtain the 'sum of averages' for a particular room (see image). I am using a macro to pull the numbers out of the grand total column.
Is this possible? Do you see another way to obtain utilization?
(note: there are lots of other columns in the data, like case start, case end, day of week, etc, that are not used in this calculation but are available)
The reason that you're getting 300 for both Average of Available Time columns is because the grand total is a grand total based on the overall average and not a sum of the averages.
Room 1: 200 + 200 + 500 / 3 = 300
Room 2: 300 + 300 + 300 / 3 = 300
I could not comment on the original question, so my solution is based on a few assumptions.
Assumption #1: The data will always be grouped. E.G. All cases in room 1 on a given day will grouped in sequential rows.
Assumption #2: The available time column is a single value for the whole day, there will never be differing available times on the same day.
Solution: Use column E as the Actual Available Time. This column will use a formula to determine if the current row has a unique combination (Date + Room + Available Time) to the previous and if so, the cell will contain that row's available time.
Formula to use in E2:
=IF(AND($A1 = $A2, $B1 = $B2, $C1 = $C2), 0, $C2)
Extend the formula as far down as necessary and then include the new column in your PivotTable data range.
End Result
I created a unique reference by combining columns and then used sumif/countif/countif.
So the formula in column E would be:
=sumif(colB,cellB,ColC)/Countif(colB,cellE)/Countif(colB,cellE)
Doesn't matter if the data is in order or not then.
Extend the formula as far down as necessary and then include the new column in your PivotTable data range.
The easiest method I would recommend is this.
=SUM(H:H)-GETPIVOTDATA("Average of Available Time",$G$3)
The first term sums the H column, and the second term subtracts the grand total value. It is a dynamic solution, and will change to fit the size of the pivot table.
My assumptions are that the Pivot Table was originally placed in cell G3.
Related
I've tried a few things on this and settled on a 'cheap' solution. Wanted to know if this can be done directly and more elegantly.
Problem Statement and Sample Data
Assume we have a table in excel with ~200 columns and a large number of rows (~10k).
Sample Data:
identifier
val1
val2
val3
...
val200
ID_1
100
102
34
...
89
We want to add a column at the end that shows us how many "moving average" outliers exist. A moving average outlier is defined as a point that is outside the range (mean - 2 * std deviations, mean + 2 * std deviations), where the mean and std dev is calculated using the previous 10 values (therefore its a moving average outlier).
We will not test the first 10 values. But from val11, the previous 10 values will be used to form the window and we want to test if the value is an outlier.
My Solution so far
I created another table of same dimensions as the original. In cells from val11 (to val200, for all columns), I put in the formula below in the new table. And then, I can simply sum the columns in each row in the new table.
Assume val11 is on X2 in the "shocks" worksheet (for first row):
=IF(OR(shocks!X2<AVERAGEA(shocks!D2:W2)-2STDEVA(shocks!D2:W2),shocks!X2>AVERAGEA(shocks!D2:W2)+2STDEVA(shocks!D2:W2)),1,"")
But if possible, I want to avoid having a second table since it bloats and slows down the file. Any help would be greaty appreciated
I need a count of how many date items fall within Data 1 & Data 2
ie:
x-1 will have a count of 2
x-2 will have a count of 1
-x-3 will have a count of 2
-y-1 will have a count of 2
What would be the best way to go abouts when approaching this?
Data 1
Data 2
Date
x
1
Date 1
x
1
Date 1
x
1
Date 2
x
2
Date 3
x
2
Date 3
y
1
Date 1
y
1
Date 1
I see only one way to interpret with the available information:
To count the number of times Date_to_test falls within Date_1 and Date_2 (screenshot below, sheet here), you could use either the sum or something like a countifs (with interim calc):
sum approach
=SUM(1*($C$2:$C$11<=$B$2:$B$11)*($A$2:$A$11<=$C$2:$C$11))
countifs + interim calc
helper
=1*(C2<=B2)*(A2<=C2)
(additional column, drag down)
countifs
=COUNTIFS($D$2:$D$11,1)
Screenshot
Alternative
as for the 'sum' approach, sumproduct variants (e.g. =SUMPRODUCT(1*($C$2:$C$11<=$B$2:$B$11),1*($B$2:$B$11>=$A$2:$A$11))) are calculation/memory intensive
despite the countifs + helper approach containing more 'visible' data - these values need only be calculated once, the countifs can then be determined independently (assuming no updates to the helper column) - thus making it more memory/calculation efficient depending upon your calculation mode, screen-updating preferences
Caveat
if, by some misfortune re: interpreting your question, you are referring to some other means of establishing whether "date items fall within Data 1 & Data 2", then without knowing what this is, there very low likelihood of being able to guess this correctly
Suppose I have 2 tables:
fTransactions
ProdID RepID Revenue
1 1 10
1 1 10
1 2 10
dSalesReps
RepID RepName
1 joe
2 sue
With dSalesReps having the following measures with no filters applied yet:
RepSales:=CALCULATE(SUM(fTransactions[Revenue]))
RepSales2:=SUMX(fTransactions, CALCULATE(SUM(fTransactions[Revenue]))
The first measure performs how I think it would. It goes to the fTransactions table and sums up the Revenue column.
The second measure, after a lot of trial and error to figure it out, seems to sort of group itself on unique rows in fTransactions. In the above example, fTransactions has 2 rows where everything is identical, then a last row where something is different. This seems to result in the following:
(10 + 10) first iteration that sums the first "grouping"
+
(10 + 10) second iteration that sums the first "grouping" again
+
(10) last iteration that sums the second "grouping"
= 20 + 20 + 10 = 50
At least that's how it looks to be operating. I just don't understand why. I thought it would go to the fTransactions table, sum all of Revenue for each iteration, then sum those sums as a final step.
This is caused by something called "context-transition" (see sqlbi more detailed explanation).
In practice, your formula "RepSales" uses a "Row Context" (created by SUMX) which is turned in an equivalent "Filter Context" (by CALCULATE), but since you don't have an unique key in the table, it gets and uses multiple rows in each iteration, below the explanation.
For the first row, the row context is ProdID=1 AND RepID=1, which turned in an equivalent filter context (stays the same, in this case) is ProdID=1 AND RepID=1 but the filter context is global, and two rows (the first 2) match this filter.
This is repeated for each row.
it does not happen with the formula "RepSales" because it does not iterate multiple times (as you already noticed)
This is your current situation:
To prove that, just add a rowID to the transaction table:
It does not happen because the equivalent filter context also include the RowID column, which matches only one row
Hope this helps, use the sqlbi article as a reference, it will be an exhaustive guide to understand this
I'm creating an excel spreadsheet to track when an item is received as well as when a response to the item having been received has been made (ie: my mail was delivered at 1:00pm (item received) but I didn't check the mail until 5:00pm (response to item having been received)).
I need to track both the date and time of the item being received and want to separate these in two separate columns. At the moment this translates to:
Column A: Date item received
Column B: Time item received
Column L: Date item was responded to having been received
Column M: Time item was responded to having been received
In essence I'm looking to run calculations on the response time between when the item is received and when it has been responded to (ie: average response time, number of responses in less than an hour, and even things like the number of responses that took between 2 and 3 hours where Bob was the person who responded).
The per-line pseudo code would look something like:
(Lr + Mr) - (Ar + Br) ' where L,M,A,B are the columns and 'r' is the row number.
An example, with the following data:
1. A B L M
2. 1/5/19 10:00 1/5/19 12:00
3. 1/5/19 21:00 1/6/19 1:00
4. 1/5/19 22:00 1/5/19 23:00
5. 1/6/19 3:00 1/6/19 4:00
The outcome for the average response time would be 2 hours (average(rows 2-5) = average(2, 4, 1, 1) = 2)
The number of items with an average response times would be as follows:
(<=1 hour) = 2
(>1 & <=2) = 2
(>2 & <=3) = 0
(>3) = 1
I don't know (or can find) a function that will perform this and then let me use it within something like a countifs() or averageifs() function.
While I could do this (fairly easily) in VBA, the practical implementation of this spreadsheet limits me to standard Excel. I suspect that sumproduct() will be fundamental to make this work, but I feel that I need something like a sumsum() function (which doesn't exist) and I'm not familiar with sumproduct() to better understand what to even look for to set something like this up.
If you are not so familiar with SUMPRODUCT() or the likes I would suggest one helper column. Like so:
You can see the formula used is:
=((C2+D2)-(A2+B2))
You can probably do all type of calculations on this helper column. Note, column is formatted hh:mm. However, if you want to look into SUMPRODUCT() you could think about these:
Formula in H2:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=1))
Formula in H3:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>1)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=2))
Formula in H4:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>2)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<3))
Formula in H5:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>3))
The helper column is the easiest approach. It gives you the time differences that you can then easily analyse however you want. Analysis without the helper column is possible, but the approach differs depending on what type of analysis you want to do.
For the example you provided, which is counting the number of time differences grouped into ranges, you would use the FREQUENCY function:
=FREQUENCY(C2:C5+D2:D5-A2:A5-B2:B5,F2:F4)
In F2:F4 (called the "bins"), enter the upper limit of each range you want to count. The Frequency function counts up to and including the first value, then counts from there up to and including the second value, and so on. Enter the bins as times, e.g. 1:00 for 1 hour.
Note that Frequency is an array-entered and an array-returning function. This you means you need to first select the range that will contain all output values, G2:G5 in this example, then enter the function, then press CTRL+SHIFT+ENTER
Also note that Frequency returns an array that is one element larger than the number of bins specified. The extra element is the count of all values greater than the largest bin specified.
I have an excel sheet with times in one column and temperatures in another. I'm trying to work out a formula that will find a certain temperature and measure how long it remained at that temperature.
11:25:29 AM 69.3°C
11:26:29 AM 69.6°C
11:27:29 AM 69.8°C
11:28:29 AM 70.0°C
11:29:29 AM 70.2°C
11:35:29 AM 70.8°C
11:36:29 AM 70.3°C
11:37:29 AM 69.5°C
11:38:29 AM 68.5°C
11:39:29 AM 67.5°C
12:39:29 PM 66.3°C
1:39:29 PM 52.1°C
2:39:29 PM 12.1°C
3:39:29 PM 5.0°C
In this example, I would like to find when it hit 70.0°C and how long it stayed above 70.0°C.
This is a bit of a tough problem because you might have multiple occasions where you go above 70 degrees. In that case, do you want the total time spent above 70 in the entire dataset, or do you want the total time spent above 70 consecutively? And then, how are you determining which of these potential multiple nonconsecutive periods you are talking about?
That said, you can try this. If column A is your datetime, and column B is your temp reading, specify another cell as your temperature reference value ($D$1 here), and in column C starting in row 2 do this:
=(A2-A1)*IF(B2>=$D$1,1,0)
and then copy that all the way down. What that does is it calculates the time difference between measurements and then if the temperature at that time is greater than your reference, it multiplies it by 1, otherwise it multiplies by 0. Because a date/time in Excel is really just a number, what you get is an interval of a day between measurements in each cell of column C. In other words, .25 = 6 hours.
Now that you have that data in column C, you are free to further parse it. You can use a simple SUM(C:C) formula in a cell, or you can go back and sum up individual ranges. I hope this helps.