I have a big dataset in Excel and I want to perform the same operation over and over until the end of all rows.
I have groups of ten values (ten rows) and for each of these groups, I want to only display each row value if it is in the range of the average +- standard deviation of each group. As an example, let's say:
Image showing two groups of ten values. First with the result that I want in red
I already managed to perform the operation that I want for a group of ten values with this formula:
=IF(AND(F2<=(AVERAGE($F$2:$F$11))+(STDEV($F$2:$F$11)),F2>=(AVERAGE($F$2:$F$11))-(STDEV($F$2:$F$11))),F2, "")
Since my data is kind of big (around 5,000 rows) I would like to have a function that I can drag to the other groups of ten values without the need to update the average and standard deviation range.
=IF(F2=MEDIAN(F2,AVERAGE(INDEX(F:F,2+10*INT((ROWS($1:1)-1)/10)):INDEX(F:F,11+10*INT((ROWS($1:1)-1)/10)))+{-1,1}*STDEV(INDEX(F:F,2+10*INT((ROWS($1:1)-1)/10)):INDEX(F:F,11+10*INT((ROWS($1:1)-1)/10)))),F2, "")
Copy down as required.
Regards
Related
I have data in below format. It shows starting and end time of an activity and calculates duration accordingly. The activity is performed through out the day at different times.
I have added a pivot. I want to find out the average duration in a workday or a holiday(Day category). When I am trying to apply average in the current pivot, it is dividing the total duration by the number of sessions in a day.For example in week 1, an activity was done on 4 work days and the total duration for the activity in workdays was 04.19, I want to divide this number by 4 and find out the average time spent on each day but the pivot divides it by 11 which is the total number of sessions in the four days.
Link for data
Steps:
Add a helper column to identify how many unique pairs of Dates/Day Categories there are:
=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)
You can add extra products to this formula to force extra fields to be unique to be counted as well.
SRC:Simple Pivot Table to Count Unique Values
Add a Calculated Field in the PivotTable that is:
SUM(Duration)/SUM([Helper Column Name]) and include it in the 'Values' section of the PivotTable. Due to the new column being added, you might have to re-create the PivotTable.
This should produce the average in the manner that you want.
Can I define different aggregation methods for subtotals in different dimension in an Excel pivot table?
The following example shows a result I'm trying to obtain. The metric to aggregate is, let's say, lines of code of a software project. The 2 dimensions in question are Date and Organization. In source data, Organization is broken down into 2 columns, Department and Project, while Date is a single column and Excel makes up the Months/Years summaries automatically when making the ODBC data connection.
A metric such as this one should be aggregated differently along the different dimensions. For the Organization dimension, the subtotal for all projects of the department is the SUM, but in the date dimension, the subtotal for all months of the year is the MAX of any given month (or perhaps AVG, or last etc. but certainly not SUM).
I've tried to define the different aggregation methods in Excel in the field settings, but it always selects one or the other method for both dimensions. Is there a way to do it, preferably using standard Pivot Table mechanisms or at worst a UDF in Excel?
What I would do to tackle this problem is to add both aggregation functions: sum and max , then hide ( or shrink a lot ) those columns you do not want to display.
in the above example I shrink columns B,D,F and I because of they has values that are out of scope for your requirements.
The "Total Max of Loc" displays a value consistent with the function expressed throughout the entire column: that is "the maximum number of lines of code reached by each project in each department; this could lead to misunderstandings when we observe the values of the subtotals and grand total; i.e: The "Grand Total - Total Max of Loc" is not the "Total Max of Sum of Loc": in the example, it shows 18 which represents the absolute maximum value of Loc in a Project in each Department; In the same way the Total Max of Loc for Department 2 is 18 and form Department 1 is 12
When requested a different behavior as expressed in comment to this answer, I think we are entering into the strong customizations space and some solution could be found by writing custom macro and by leveraging the getpivotdata function or, if it can be acceptable for your case, simply by the addition of a new column with the max()formula and possibly hiding the column "Total Max of Loc"
I am not looking for any code or formula but a rationale/logic.
Background: My data set comes in Date/Time format where a new timestamp is created for each new occurrence of an event.
My goal is to calculate number of occurrences within each hour for a given day. Unfortunately, system does not capture number if occurrences per period as integers. So I have count the number of time an hour value appears within the hour i.e number of times 4 o'clock hour appears. I am currently using Pivot Table in Excel to count the number of times each hour appears. Fields in Rows are hour and dates, and field in Values is count of hour.
Trouble is that I cannot use any summarize functions to get stuff like sum, min, max, percentile, and standard deviation. For example, changing count to sum will only add up all hours. So sum of 4 o'clock hour will return 12 instead of 3. So I am having to use array formulas on pivot table to give me max and min etc.
If I was to use this data in data viz tools like Tableau or Power BI. I won't be able to get very far. I am looking for a suggestions/workaround that can allow me to manipulate my data in a way so it can be used in Pivot Tables in Excel and in data viz tools.
I know my questions is not specific to one tool but I am looking to enhance me understanding of data and data manipulations techniques.
EDIT: Please see attached image
Build a data model, using PowerPivot. Join your fact table to a calendar dimension table. Create a row count measure - you can then summarise that measure to suit (sum, average, min, etc)
I have an excel table:
JobA .03445
JobB .01366
JobC .93271
JobD .6335
Plus 65,000 more.
What I need to do, is to create four equal buckets based on the values. where the sum of all Jobs in each bucket come as close to the other three buckets as possible.
Is there a way to do this in Excel?
Thanks
You can try this approach based on the incremental percentage. So you sum each incremental job until your sum reaches 25% of total values (that is BucketA), jobs from 25-50% will be "BucketB", 50-75% "BucketC", and rest will go into "BucketD". Sum of values in each bucket should be pretty close since you have 65k of values.
enter this formula
=IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.25,"BucketA",IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.5,"BucketB",IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.75,"BucketC","BucketD")))
in cell C1 and drag it to the bottom.
There's lots of studies into algorithms that solve these types of problems. Your problem is actually the exact same format as the equal piles example in this article:
https://simple.wikipedia.org/wiki/P_versus_NP#Example
Considering the volume you're working with and the fairly narrow range of values, you could get a fairly good approximate solution by simply doing this:
Sort all items in descending order by value
In an adjacent column, put 1, 2, 3 and 4 against the first 4 values.
Use autofill to repeat that pattern against all values
You should now have 4 groups of fairly equal value
I have a set of data taken at 10 millisecond intervals. I need to group this data into 15 minute blocks (9,000 milliseconds) and obtain the log average from one column and the 10th lowest percentile from the other (my data is in decibels).
Is there a way I can split the data every 9,000th row and form groups from this, and then apply a formula to each group - without having to repeat the process for each group? i.e. some way to set it up so I can just drag down results (I have two weeks of data.)
To calculate average of rows 1-9000 in column A you can use following formula:
=AVERAGE(INDEX(A:A,(ROWS($1:1)-1)*9000+1):INDEX(A:A,(ROWS($1:1))*9000))
If you drag it down, next one will calculate rows 9001-18000 and so on.