In Excel I have a list of values (in random order), where I want to figure out which values that comprise 75% of the total value; i.e. if adding the largest values together, which ones should I include in order to get to 75% of the total (largest to smallest). I would like to find the "cut-off value", i.e. the smallest number to include in the group of values (that combined sum up to 75%). However I want to do this without first sorting my data.
Consider below example, here we can see that the cutoff is at "Company 6", which corresponds to a "cut-off value" of 750.
The data I have is not sorted, hence I just want to figure out what the "cut-off value" should be, because then I know that if the amount in the row is above that number, it is part of group of values that constitute 75% of the total.
The answer can be either Excel or VBA; but I want to avoid having to sort my table first, and I want to avoid having a calculation in each row (so ideally a single formula that can calculate it).
Row number
Amount
Percentage
Running Total
Company 1
1,000
12.9%
12.9%
Company 2
950
12.3%
25.2%
Company 3
900
11.6%
36.8%
Company 4
850
11.0%
47.7%
Company 5
800
10.3%
58.1%
Company 6
750
9.7%
67.7%
Company 7
700
9.0%
76.8%
Company 8
650
8.4%
85.2%
Company 9
600
7.7%
92.9%
Company 10
550
7.1%
100.0%
Total
7,750
75% of total
5,813
EDIT:
My initial thought was to use percentile/quartile function, however that is not giving me the expected results.
I have been trying to use a combination of percentrank, sort, sum and aggregate - but cannot figure out how to combine them, to get the result I need.
In the example I want to include Companies 1 through 6, as that summarize to 5250, hence the smallest number to include is 750. If I add Company 7 I get above the 5813 (which is where 75% is).
VBA bubble sort - no changes to sheet.
Option Explicit
Sub calc75()
Const PCENT = 0.75
Dim rng, ar, ix, x As Long, z As Long, cutoff As Double
Dim n As Long, i As Long, a As Long, b As Long
Dim t As Double, msg As String, prev As Long, bFlag As Boolean
' company and amount
Set rng = Sheet1.Range("A2:B11")
ar = rng.Value2
n = UBound(ar)
' calc cutoff
ReDim ix(1 To n)
For i = 1 To n
ix(i) = i
cutoff = cutoff + ar(i, 2) * PCENT
Next
' bubble sort
For a = 1 To n - 1
For b = a + 1 To n
' compare col B
If ar(ix(b), 2) > ar(ix(a), 2) Then
z = ix(a)
ix(a) = ix(b)
ix(b) = z
End If
Next
Next
' result
x = 1
For i = 1 To n
t = t + ar(ix(i), 2)
If t > cutoff And Not bFlag Then
msg = msg & vbLf & String(30, "-")
bFlag = True
If i > 1 Then x = i - 1
End If
msg = msg & vbLf & i & ") " & ar(ix(i), 1) _
& Format(ar(ix(i), 2), " 0") _
& Format(t, " 0")
Next
MsgBox msg, vbInformation, ar(x, 1) & " Cutoff=" & cutoff
End Sub
So, set this up simply as I suggested.
You can add or change the constraints as you wish to get the results you need - I chose Binary to start but you could limit to integer and to 1, 2 or 3 for example.
I included the roundup() I used as well as the sumproduct.
I used Binary as that gives a clear indication of the ones chosen, integer values will also do the same of course.
Smallest Value of a Running Total...
=LET(Data,B2:B11,Ratio,0.75,
Sorted,SORT(Data,,-1),MaxSum,SUM(Sorted)*Ratio,
Scanned,SCAN(0,Sorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
srIndex,XMATCH(0,Scanned)-1,
Result,INDEX(Sorted,srIndex),Result)
G2: =SORT(B2:B11,,-1)
H2: =SUM(B2:B11)*0.75
I2: =SCAN(0,G2#,LAMBDA(a,b,IF((a+b)<$H$2,a+b,0)))
J2: =XMATCH(0,I2#)
K2: =INDEX(G2#,XMATCH(0,I2#)-1)
The issue that presents itself is that there could be duplicates in the Amount column when it wouldn't be possible to determine which of them is the correct result.
If the company names are unique, an accurate way would be to return the company name.
=LET(rData,A2:A11,lData,B2:B11,Ratio,0.75,
Data,HSTACK(rData,lData),Sorted,SORT(Data,2,-1),
lSorted,TAKE(Sorted,,-1),MaxSum,SUM(lSorted)*Ratio,
Scanned,SCAN(0,lSorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
rSorted,TAKE(Sorted,,1),rIndex,XMATCH(0,Scanned)-1,
Result,INDEX(rSorted,rIndex),Result)
Note that you can define a name, e.g. GetCutOffCompany with the following part of the LAMBDA version of the formula:
=LAMBDA(rData,lData,Ratio,LET(
Data,HSTACK(rData,lData),Sorted,SORT(Data,2,-1),
lSorted,TAKE(Sorted,,-1),MaxSum,SUM(lSorted)*Ratio,
Scanned,SCAN(0,lSorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
rSorted,TAKE(Sorted,,1),rIndex,XMATCH(0,Scanned)-1,
Result,INDEX(rSorted,rIndex),Result))
Then you can use the name like any other Excel function anywhere in the workbook e.g.:
=GetCutOffCompany(A2:A11,B2:B11,0.75)
I'm using Excel 2013, simplifying the actual data, suppose you have two columns of data, similar to below:
Column1
Column2
A
D
A
C
B
C
C
B
C
A
D
A
My goal is to either filter or remove the duplicate pairs regardless of what column they're in. My desired output would be as follows:
Column1
Column2
A
D
A
C
B
C
So, as you can see, the C-B row was eliminated because the pair exists in the row containing B-C. Same for the rows containing C-A (A-C) and D-A (A-D). Data will be in no particular order and it doesn't matter which of the two paired matches are removed.
Solution can be via creating a new table or even using VBA to output a new table. Whatever is easiest.
Thanks in advance!
I was able get this to work for me, so I thought I'd post my answer hopefully helping someone else. As I suspected, and Quailia suggested, VBA was the way to go.
Few things changed along the way. I needed to add an additional column to the table as it was determined that there would be some duplicates that I needed to keep. The values in the first column now group rows together, so later duplicates are kept. The values I need to check for duplicate pairs, are in column 2 and 3.
Table1:
Column1
Column2
Column3
01
A
D
01
A
C
01
B
C
01
C
B
01
C
A
01
D
A
Also, I realized I would be dealing with a fixed number of rows, so I didn't have a need to keep changing the size of my array.
Sub RemoveDuplicatePairs()
'Checks a 3 column table for rows with duplicate values when column 2 and 3 are swapped, removes duplicates and outputs unique rows to a new table
Dim aArrayList() As Variant
Dim bArrayList(1 To 144, 1 To 3) As Variant
Dim aRowNum As Integer, aColNum As Integer, bRowNum As Integer, bColNum As Integer
aArrayList = Range("Table1")
bRowNum = 1
'Loop through rows of 1st array
For aRowNum = 1 To UBound(aArrayList, 1)
If aRowNum = 1 Then
bArrayList(1, 1) = aArrayList(1, 1)
bArrayList(1, 2) = aArrayList(1, 2)
bArrayList(1, 3) = aArrayList(1, 3)
'Debug.Print bArrayList(bRowNum, 1) & aArrayList(bRowNum, 2) & bArrayList(aRowNum, 3)
bRowNum = 2
ElseIf aRowNum > 1 Then
'Check if current row in 1st array already exists in 2nd array by comparing values after flipping 2nd and 3rd column values from 1st array
For i = 1 To bRowNum - 1
If aArrayList(aRowNum, 1) = bArrayList(i, 1) And aArrayList(aRowNum, 2) = bArrayList(i, 3) And aArrayList(aRowNum, 3) = bArrayList(i, 2) Then
'Duplicate pair already exists, move onto next row of 1st array
GoTo SKIP
End If
Next i
'No duplicate found, add row from 1st array to 2nd array
bArrayList(bRowNum, 1) = aArrayList(aRowNum, 1)
bArrayList(bRowNum, 2) = aArrayList(aRowNum, 2)
bArrayList(bRowNum, 3) = aArrayList(aRowNum, 3)
'Debug.Print bArrayList(bRowNum, 1) & bArrayList(bRowNum, 2) & bArrayList(bRowNum, 3)
bRowNum = bRowNum + 1
SKIP:
End If
Next aRowNum
'Write array to new table
Sheets("Sheet1").Range("Table2").Value = bArrayList()
End Sub
Table2:
Column1
Column2
Column3
01
A
D
01
A
C
01
B
C
I'm sure there is an easier, neater way to do the same thing, but as my username implies, I'm still pretty new to this. Any suggestions that improve or simplify my logic are certainly welcome.
I am trying to create a multidimensional array where the first column contains identifiers and the adjacent columns contains data relevant to that identifier. So for instance I would like to create an array with the following structure:
Banana 10 20 30 40
Coconut 5 10 2 4
Apple 3 4 5 6
The construction of the array begins with the definition of the relevant identifiers. So for instance in the above that would be Banana, Coconut and Apple. The data I use to construct the array would have a layout as in the below:
Banana 10 20 30 40
Parrot 5 3 1 4
Apple 3 4 5 6
Car 10 20 30 40
Donkey 4 12 3 0
Coconut 5 10 2 4
As such, I start out by defining the Banana, Coconut and Apple identifiers and then want to automatically populate my array based on a loop through of identifier name in the data (I have defined this as "INPUT"). However, I am unsure of how to correctly insert the adjacent data in my array every time there is a match of identifiers. I would much appreciate if someone can explain how I can do this based on the code below.
identifierArray = Array("Banana", "Coconut", "Apple")
NumElements = UBound(identifierArray) - LBound(identifierArray) + 1
For Each Element In identifierArray
ReDim Preserve arr(0 To NumElements, x)
arr(i, 0) = identifierArray(i)
i = i + 1
Next Element
For Each cell In ws.Range("INPUT")
For Each Element In identifierArray
If cell.Value = Element Then
[Need help here]
End If
Next Element
Next cell
I don't need help with creating VLOOKUP or INDEX/MATCH solutions as that is not relevant to the above.
You can fill an array from a range on your sheet like this:
Option Explicit
Sub Test()
Dim arr
With ThisWorkbook.Sheets("Data")
arr = .Range("A1:E6")
End With
End Sub
So a range like this:
Turns into a array like this:
So you don't need to loop at all, which means faster execution and cheaper to code.
I am stuck, and would like a little hlep. I'm trying to get a qty on Column D12, d14, d16 .... to d42. Than multiply this qty by a value. the value sheet look like this.
Value Sheet
A B C
ItemName Quality Confort
1 Chair 2 1
2 Bed 0 3
3 Table 1 1
Quantity Sheet
A B C D
ItemName QTYColumn
12 .. Table 2
13
14 .. Chair 5
15
16 .. Bed 6
Total Sheet
A B
Quality 12 (2*1 + 5*2 + 6*0 )
Confort 25 (2*1 + 5*1 + 6*3 )
I'm pretty sur I have the hardest part done. I can check and grab the quantity from all the sheets I want. I also got a function done where you pass the name of the item, and the stats name, and it return me the results I want.
so, I got this part of the code atm which doesnt work, and its driving me nuts.
For Counter = 12 To 42 Step 2
For Each qColumn In QTYColumn
Set QTY = Range(qColumn & Counter)
Dim ItemName As Range
ItemName= QTY.Offset(-2, 0).Select
total = total + (QTY * GetValue(ItemName, "Confort"))
Next qColumn
Next Counter
My problem is with the ItemName variable. Its always empty and as soon as I get to it with the debugger, the function stops and it closes. Anyone have any idea as to why ? it's important for me to get it base on the offset -2 and not the column adress because it might be different depending of the sheet, and the only "sure" way to find it is the get the 2nd cell to the left of the quantity cell.
ItemName= QTY.Offset(-2, 0).Select does not mean anything !
Either you Select:
QTY.Offset(-2, 0).Select
or you get the value:
ItemName= QTY.Offset(-2, 0).Value '(value can be omitted here)
But then, Dim ItemName As Range does not make sense. It should be a String or a a number.
or you get the range:
Set ItemName= QTY.Offset(-2, 0) ' then you need Set
I have an Excel file with several columns in it and many rows. One column, say A has ID numbers. Another column, say G has prices. Column A has repeating ID numbers, however not all numbers repeat the same amount of times. Sometimes just once, other times 2, 3 or several times. Each column G for that row has a unique price.
Basically, I need to average those prices for a given ID in column A. If each ID was repeated the same number of times, this would be quite simple, but because they are not I have to manually do my average calculation for each grouping. Since my spreadsheet has many many rows, this is taking forever.
Here is an example (column H is the average that I am currently calculating manually):
A ... G H
1 1234 3.00 3.50
2 1234 4.00
3 3456 2.25 3.98
4 3456 4.54
5 3456 5.15
11 8890 0.70 0.95
13 8890 1.20
...
So in the above example, the average price for ID# 1234 would be 3.50. Likewise, the average price for ID# 3456 would be 3.98 and for #8890 would be 0.95.
NOTICE how rows are missing between row 5 and 11, and row 12 is missing too? That is because they are filtered out for some other reason. I need to exclude those hidden rows from my calculations and only calculate the average for the rows visible.
Im trying to write a VBA script that will automatically calculate this, then print that average value for each ID in column H.
Here is some code I have considered:
Sub calcAvg()
Dim rng As Range
Set rng = Range("sheet1!A1:A200003")
For Each Val In rng
Count = 0
V = Val.Value '''V is set equal to the value within the range
If Val.Value = V Then
Sum = Sum + G.Value
V = rng.Offset(1, 0) '''go to next row
Count = Count + 1
Else
'''V = Val.Value '''set value in this cell equal to the value in the next cell down.
avg = Sum / Count
H = avg '''Column G gets the avg value.
End If
Next Val
End Sub
I know there are some problems with the above code. Im not too familiar with VBA. Also this would print the avg on the same line everytime. Im not sure how to iterate the entire row.
This seems overly complicated. Its a simple problem in theory, but the missing rows and differing number of ID# repetitions makes it more complex.
If this can be done in an Excel function, that would be even better.
Any thoughts or suggestions would be greatly appreciated. thanks.
If you can add another row to the top of your data (put column Headers in it) its quite simple with a formula.
Formula for C2 is
=IF(A2<>A1,AVERAGEIFS(B:B,A:A,A2),"")
copy this down for all data rows.
This applies for Excel 2007 or later. If using Excel 2003 or earlier, use AVERAGEIF instead, adjusting ranges accordingly
If you can't add a header row, change the first formula (cell C1) to
=AVERAGEIFS(B:B,A:A,A1)
In my way ..
Sub calcAvg()
Dim x, y, i, y2, t, Count, Mount As Integer
Dim Seek0 As String
x = 1 '--> means Col A
y = 1 '--> means start - Row 1
y2 = 7 '--> means end - Row 19
For i = y To y2
If i = y Then
Seek0 = Cells(i, x)
t = i
Count = Cells(i, x + 6)
Mount = 1
Else
If Cells(i, x) <> Seek0 Then
Cells(t, x + 7) = Count / Mount
Count = Cells(i, x + 6)
Mount = 1
t = i
Seek0 = Cells(i, x)
Else
Count = Count + Cells(i, x + 6)
Mount = Mount + 1
End If
End If
Next
End Sub
Hope this helps ..