Taking the top values and averaging from data list in Excel - excel

I have a data list in Excel, I am looking to take the top 3 values for each number, and get the average for those 3 values quickly. I often work with lists of up to 50,000 lines which at any one time could convert to over 10,000 different column A numbers.
I understand basic pivot tables to get an average after the top 3 values are collected, but need to find a way to remove all values that are not the top 3,
I trust this may be an extremely simple ask, or complex and thank you in advance for your help.

you can use =LARGE(Array, k) formula. For example, =LARGE(B:B, 1) is for 1-largest number, =LARGE(B:B, 2) is for 2-largest number etc.
If column contains many duplicates, and you want to get all occurences of top three values, use this formula to get all of them (put:
=IF(LARGE(B:B,ROW(A1))>=LARGE(B:B,COUNTIF(B:B,LARGE(B:B,COUNTIF(B:B,MAX(B:B))+1))+COUNTIF(B:B,MAX(B:B))+1),LARGE(B:B,ROW(A1)),"")

Related

generate all possibilities from two fixed rows of entries?

I spent hours trying to look for a solution and I feel like I got close but figured asking would be the best way.
Lets say I have a table with 2 columns, column A is an item, and column B is a price for the item. This table has 12 entries. What I would like to do is generate additional tables of 6 entries that do not exceed a certain price. see below for example. The number i want these table to not exceed is 50,000.
for example the first entry could be an apple at 9,000 value. the apple is column a, and value column b.
Can someone help with a way to generate all combinations of 6 items from column a, that do not exceed a combined price of 50,000 in column b?
With 12 items you have 212-1 or 4095 possible combinations of products. These can map into the 12 bits of a 12-bit binary number. It is not difficult to write a macro to calculate the total cost of each combination and then filter the result to display results less than or equal to 50,000.
EDIT#1:
Please see:
Best possible combination sum of predefined numbers that smaller or equal NN
Listing all possible combination without repetition,VBA

how I can divide a sum number into 50 column in ecxel?

I would like to tell, with a smaller number of columns.
Let's say we have a sum of 24 and we want to distribute it randomly into 10 separate columns. we should get such a result as below I wrote.
Is there a formula in Excel like this?
Thanks in advance.
Ok. Here is what I would do...
For each required split, select a random number between zero and the remainder of the distribution qty, multiplied by the percentage of how many splits have already been calculated. This prevents the first few splits being very high, and the rest being zero.
I would also add a check for the very last split to make sure that it equals whatever is left of the original distribution qty.
Here is an image for illustration and the formula that I have used:
=IF($A2=MAX($A:$A),$F$1-SUM($B$1:$B1),RANDBETWEEN(0,($F$1-SUM($B$1:$B1))*($A2/MAX($A:$A))))
Hopefully, this isn't too complex to understand. If you need further explanation, please let me know.
You can simply change the distribution qty in the yellow box, and if you want more splits, all you need to do is drag down columns A & B to the required number.

How to create four equal buckets of decimal values

I have an excel table:
JobA .03445
JobB .01366
JobC .93271
JobD .6335
Plus 65,000 more.
What I need to do, is to create four equal buckets based on the values. where the sum of all Jobs in each bucket come as close to the other three buckets as possible.
Is there a way to do this in Excel?
Thanks
You can try this approach based on the incremental percentage. So you sum each incremental job until your sum reaches 25% of total values (that is BucketA), jobs from 25-50% will be "BucketB", 50-75% "BucketC", and rest will go into "BucketD". Sum of values in each bucket should be pretty close since you have 65k of values.
enter this formula
=IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.25,"BucketA",IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.5,"BucketB",IF(SUM($B$2:B2)/SUM($B$2:$B$100000)<0.75,"BucketC","BucketD")))
in cell C1 and drag it to the bottom.
There's lots of studies into algorithms that solve these types of problems. Your problem is actually the exact same format as the equal piles example in this article:
https://simple.wikipedia.org/wiki/P_versus_NP#Example
Considering the volume you're working with and the fairly narrow range of values, you could get a fairly good approximate solution by simply doing this:
Sort all items in descending order by value
In an adjacent column, put 1, 2, 3 and 4 against the first 4 values.
Use autofill to repeat that pattern against all values
You should now have 4 groups of fairly equal value

AverageIf and Multiple data strings

I'm involved with a youth football tournament on the referee side, with assessing/coaching the referees. I've just taken over doing the data entry for the referees assessment scores which we then use to determine who gets finals etc and am looking to extract more usable information from the data to help us identify trends.
I've got (up to) 200 referees, each receiving from none to two assessment scores each day for 5 days. The scores are entered as both the raw mark and the weighted mark based on match difficulty (along with a host of other data about the match that isn't relevant to this issue.
I can extract the average mark (raw and weighted) across all referees without issues and have done so using the below formula, which is the raw average mark:
=AVERAGE(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200)
But I also want to extract the average mark (raw and weighted) across two subsets - Academy and non academy referees, to help plot trends and determine where resources need to be utilised.
I've attempted to use an AVERAGEIF formula, but am getting a #VALUE! return. This is the formula that I've attempted to use to return the average raw mark for those referees in the academy:
=AVERAGEIF(Working!G4:G200,Working!G4:G200="Yes",(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200))
If I do the same formula as above, but without the brackets around the [average_range], I get a 'you've used too many arguments, and it highlights BK200.
From what I've been able to find so far online, it seems that the formula I'm trying to use would only work if ALL the cells in (Working!G4:G200) returned "Yes". However if there are only 50 academy referees as indicated by "Yes" in G column, then I want those specific scores to be averaged, and the inverse for the non-academy referees.
I thought about having another sheet, which would simply contain populate from Column G (a simple =G4 and then populated down to =G200 next to all of the scores), consolidated into a block of raw marks columned under Assessment 1, 2, 3, 4.... and then the same for all of the weighted marks which would populate from the equivalent cell on the working sheet, but there's a lot of filtering, and re-sorting that goes on on the working sheet, and I'm not 100% certain that that wouldn't cause issues.
Any feedback on how to work through this problem, so that I can display the overall average mark for academy and non-academy referees in both raw and weighted form would be much appreciated, and I apologize if this post is rather convoluted.
I don't think there is a neat solution if the scores are in several columns which are not consecutive.
My suggestion is:-
(1) Work out the sum for each column separately and total them up
(2) Work out the count for each column separately and total them up
(3) Divide Sum by Count to get Average.
In my small example below with 3 referees and 3 columns:-
(1) In K2:-
=SUMIF(H2:H4,"Yes",B2:B4)+SUMIF(H2:H4,"Yes",D2:D4)+SUMIF(H2:H4,"Yes",F2:F4)
(2) In K3:-
=COUNTIFS(B2:B4,">=0",H2:H4,"Yes")+COUNTIFS(D2:D4,">=0",H2:H4,"Yes")+COUNTIFS(F2:F4,">=0",H2:H4,"Yes")
(3) In K4:
=K2/K3
This would include any zero scores (if this is possible) but exclude any blanks.
You can then scale it up to your data.
Beyond this, you would have to change the data structure either
(1) Add a row to label the columns that you want to average e.g.
Score 1 Score 2 Score 3
3 0 3
so you could pick up only the columns labelled 3 say
Here's how it would be in my small example:-
In K3:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*B3:F5)
Which is an array formula and must be entered with Ctrl-Shift-Enter
In K4:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*(B3:F5<>""))
another array formula
In K5:-
=K3/K4
This is how the columns you want are labelled with a 3 in row 2, so it ignores the other columns:-
(2) Consolidate them into another sheet as you suggest.

Picking top 5 scores from a range

I run a small golf eclectic with excel. One of the things we have is a points system. I would like to get the 5 highest points scored over the season and have them ranked from 1 (being the highest points scored) to 5.
My knowledge of excel "sums" goes only a wee bit further than add and subtract.
Thanks!
If you don't want to change the order that they are presently in you can use the LARGE function. It returns the kth largest value.
Below is a great formula, if you drag it down it will automatically get the second, third and nth largest value from a table of data (in this example the data is between A1 to A10).
=LARGE(A1:A10,ROW(A1)-ROW($A$1)+1)
You can then match the values with names or corresponding data from the tables using the MATCH and INDEX functions. The example below would fetch the name for each value from the second column.
=INDEX($A$1:$B$10,MATCH(cell reference with score or value,$A$1:$B$10,2))
Play around with these formulas, they are very convenient for data m
If you have a column containing the scores, you could add a filter (Data->Filter I think) and sort descending.
Though, if you just have rows that are something like [Date][Person][Score] you'll need to go to another sheet and SUM the scores for each person then sort that... Unfortunately my Excel skills aren't up to par to pull a score for each person like that.
Given a list of numbers in A1 to A10, you can work out their 'Rank' relative to each other by using 'RANK'.
e.g.
RANK(A1,A1:A6,0)
RANK(cell, list of cells to check against, order)
For order, 0 = descending.
From there you can work out which one is first pragmatically.
If you have Excel 2007,
Check that your data is continuous, with no blank rows or columns. Click on your scores and then select 'Data - Filter'
Using the dropdown that the filter creates at the top of your scores column and select 'Number filters - Top ten'
A 'Top ten Autofilter' dialog will be displayed, reduce the show 10 to 5 and then click on OK.
For earlier versions of Excel add a RANK formula in a new column. Be careful as the scores need to be sorted, usually into descending order. If there are any ties, they will be given the same ranking number and the subsequent rank number will be incremented by the number of ties. (E.g. If there are two scores of 2, ranked as 5. The next score will be ranked as 7, not 6)
If you want to use the LARGE Function as described above, make sure you put the same range in the list for each of the LARGE functions. That is, change =LARGE(A1:A10,ROW(A1)-ROW($A$1)+1) to =LARGE(A$1:A$10,ROW(A1)-ROW($A$1)+1) or you will get some strange incorrect results

Resources