In Excel I have a list of values (in random order), where I want to figure out which values that comprise 75% of the total value; i.e. if adding the largest values together, which ones should I include in order to get to 75% of the total (largest to smallest). I would like to find the "cut-off value", i.e. the smallest number to include in the group of values (that combined sum up to 75%). However I want to do this without first sorting my data.
Consider below example, here we can see that the cutoff is at "Company 6", which corresponds to a "cut-off value" of 750.
The data I have is not sorted, hence I just want to figure out what the "cut-off value" should be, because then I know that if the amount in the row is above that number, it is part of group of values that constitute 75% of the total.
The answer can be either Excel or VBA; but I want to avoid having to sort my table first, and I want to avoid having a calculation in each row (so ideally a single formula that can calculate it).
Row number
Amount
Percentage
Running Total
Company 1
1,000
12.9%
12.9%
Company 2
950
12.3%
25.2%
Company 3
900
11.6%
36.8%
Company 4
850
11.0%
47.7%
Company 5
800
10.3%
58.1%
Company 6
750
9.7%
67.7%
Company 7
700
9.0%
76.8%
Company 8
650
8.4%
85.2%
Company 9
600
7.7%
92.9%
Company 10
550
7.1%
100.0%
Total
7,750
75% of total
5,813
EDIT:
My initial thought was to use percentile/quartile function, however that is not giving me the expected results.
I have been trying to use a combination of percentrank, sort, sum and aggregate - but cannot figure out how to combine them, to get the result I need.
In the example I want to include Companies 1 through 6, as that summarize to 5250, hence the smallest number to include is 750. If I add Company 7 I get above the 5813 (which is where 75% is).
VBA bubble sort - no changes to sheet.
Option Explicit
Sub calc75()
Const PCENT = 0.75
Dim rng, ar, ix, x As Long, z As Long, cutoff As Double
Dim n As Long, i As Long, a As Long, b As Long
Dim t As Double, msg As String, prev As Long, bFlag As Boolean
' company and amount
Set rng = Sheet1.Range("A2:B11")
ar = rng.Value2
n = UBound(ar)
' calc cutoff
ReDim ix(1 To n)
For i = 1 To n
ix(i) = i
cutoff = cutoff + ar(i, 2) * PCENT
Next
' bubble sort
For a = 1 To n - 1
For b = a + 1 To n
' compare col B
If ar(ix(b), 2) > ar(ix(a), 2) Then
z = ix(a)
ix(a) = ix(b)
ix(b) = z
End If
Next
Next
' result
x = 1
For i = 1 To n
t = t + ar(ix(i), 2)
If t > cutoff And Not bFlag Then
msg = msg & vbLf & String(30, "-")
bFlag = True
If i > 1 Then x = i - 1
End If
msg = msg & vbLf & i & ") " & ar(ix(i), 1) _
& Format(ar(ix(i), 2), " 0") _
& Format(t, " 0")
Next
MsgBox msg, vbInformation, ar(x, 1) & " Cutoff=" & cutoff
End Sub
So, set this up simply as I suggested.
You can add or change the constraints as you wish to get the results you need - I chose Binary to start but you could limit to integer and to 1, 2 or 3 for example.
I included the roundup() I used as well as the sumproduct.
I used Binary as that gives a clear indication of the ones chosen, integer values will also do the same of course.
Smallest Value of a Running Total...
=LET(Data,B2:B11,Ratio,0.75,
Sorted,SORT(Data,,-1),MaxSum,SUM(Sorted)*Ratio,
Scanned,SCAN(0,Sorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
srIndex,XMATCH(0,Scanned)-1,
Result,INDEX(Sorted,srIndex),Result)
G2: =SORT(B2:B11,,-1)
H2: =SUM(B2:B11)*0.75
I2: =SCAN(0,G2#,LAMBDA(a,b,IF((a+b)<$H$2,a+b,0)))
J2: =XMATCH(0,I2#)
K2: =INDEX(G2#,XMATCH(0,I2#)-1)
The issue that presents itself is that there could be duplicates in the Amount column when it wouldn't be possible to determine which of them is the correct result.
If the company names are unique, an accurate way would be to return the company name.
=LET(rData,A2:A11,lData,B2:B11,Ratio,0.75,
Data,HSTACK(rData,lData),Sorted,SORT(Data,2,-1),
lSorted,TAKE(Sorted,,-1),MaxSum,SUM(lSorted)*Ratio,
Scanned,SCAN(0,lSorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
rSorted,TAKE(Sorted,,1),rIndex,XMATCH(0,Scanned)-1,
Result,INDEX(rSorted,rIndex),Result)
Note that you can define a name, e.g. GetCutOffCompany with the following part of the LAMBDA version of the formula:
=LAMBDA(rData,lData,Ratio,LET(
Data,HSTACK(rData,lData),Sorted,SORT(Data,2,-1),
lSorted,TAKE(Sorted,,-1),MaxSum,SUM(lSorted)*Ratio,
Scanned,SCAN(0,lSorted,LAMBDA(a,b,IF((a+b)<=MaxSum,a+b,0))),
rSorted,TAKE(Sorted,,1),rIndex,XMATCH(0,Scanned)-1,
Result,INDEX(rSorted,rIndex),Result))
Then you can use the name like any other Excel function anywhere in the workbook e.g.:
=GetCutOffCompany(A2:A11,B2:B11,0.75)
Lets say I have the following data:
A B C
1 =B1
2 =C2
3 =C3
I want to write a macro than can add text around the values in column A without losing the reference.
For example
A B C
1 [Hello].[1] 1
2 [Hello].[20] 20
3 [Hello].[10] 10
As an example for A1 I use:
.cells(1,1) = "[Hello].[" & .cell(1,2) & "]"
This give me the wanted value but in the end I lose the cell reference in A1.
I rather want this:
A B C
1 [Hello].[B1] 1
2 [Hello].[C2] 20
3 [Hello].[C3] 10
Of course with the actual value of the reference and not just the addresses.
With .cells(1,1)
.value = "=" & .cell(1,2).address
.format= ""[Hello].[" 0 "]""
End With
this should do. more info on this excel exotic syntax on that random website https://exceljet.net/custom-number-formats
best of luck, but consider using another language than vba
How can I generate random numbers 0 or 1 in 10 cells in the row, in which the sum of the random number is always equal to 7?
enter image description here
Here's a way to get seven "1"s and three "0"s in random order using RAND and RANK
In A1:J1: =RAND()
In A2:J2: =IF(RANK(A1,$A$1:$J$1,1)>3,1,0)
Available here is a version that I really think works! https://www.dropbox.com/s/ec431fu0h0fhb5i/RandomNumbers.xlsx?dl=0
And here's the '0 and 1' version (sheet 2 at the above link):
De-dup Rankings Randoms First Cut Sorted
0.47999002 7 0.479992063 1 1
0.68823003 3 0.688233075 1 1
0.07594004 9 0.075938331 1 1
0.02077005 10 0.020766892 1 0
0.69217006 2 0.692170173 1 0
0.73355007 1 0.733549516 1 1
0.51546008 6 0.515462872 1 1
0.62308009 4 0.623078278 0
0.33033001 8 0.330331577 1
0.561260011 5 0.561260557 1
Formulae for columns A-C exactly as before, D is just 7 1's, E is:
=VLOOKUP(ROW(E2)-1,B$1:D$11,3,FALSE)
Assuming that you want a list of positive random numbers that add to 7 you can use this following method.
Enter a 0 in the top-left cell (Blue Cell).
Enter =RAND()*7 into the next 9 cells below the 0 (Orange Cells).
Enter a 7 in the cell below the 9 random values (Blue Cell).
Copy the 9 random values and paste-special-values over top to turn the formulas into values.
Sort just these 9 cells in ascending order
In the cell just to the right of the first random value put a formula that subtracts the cell to the left and one above from the cell to the left (Yellow Cells).
Repeat this formula down to the cell next to the 7 that was typed in.
Sum the values in the second column (Green Cell).
That should give you 10 random values whose sum is exactly 7.
The only issue is that getting the values to be between 0 and 1 will take a bit of trial and error.
It appears that trial and error may not be practical. It's about a one in 2,710 times that this list will contain only numbers between 0 and 1. Not overly practical. Sorry.
To answer the question in the post, enter this in A1:J1 as an array formula (ctrl+shift+enter):
=1-(TRANSPOSE(MOD(SMALL(RANDBETWEEN(0,1e12*(ROW(INDIRECT("1:10"))>0))+(ROW(INDIRECT("1:10"))-1)/10,ROW(INDIRECT("1:10"))),1))>0.65)
To answer the question in the post title, do the following:
In A1:J1 enter:
=RAND()
In K1 enter:
=IF(SUM(A1:J1)<7,(7-SUM(A1:J1))/(COUNT(A1:J1)-7),7/SUM(A1:J1))
In L1 enter:
=IF(SUM($A1:$J1)<7,(A1+$K1)/($K1+1),A1*$K1)
Fill over to U1.
I believe the 10 numbers generated will be identically distributed in [0,1), but obviously not uniformly (I'm fairly certain the distribution does not have a name). The numbers can't be considered independent. A few statistics on the distribution:
Mean: 0.7 (as expected)
The other statistics are estimated from 10,000 simulations:
Variance: 0.0295
Kurtosis: -0.648
Skewness: -0.192
Think of it as drawing a sample of size 7 from the set {1, 2, ..., 10}. The 1s correspond to the numbers chosen for inclusion in the sample. Here is some VBA code which generates such samples:
Function sample(n As Long, k As Long) As Variant
'returns a variant of length n
'consisting of k 1s and n-k 0s
'thought of as a sample of {1,...,n} of size k
Dim v As Variant 'vector to hold sample
Dim numbers As Variant
Dim i As Long, j As Long, temp As Long
ReDim v(1 To n)
ReDim numbers(1 To n)
For i = 1 To n
v(i) = 0
numbers(i) = i
Next i
'do k steps of a Fisher-Yates shuffle on numbers
For i = 1 To Application.WorksheetFunction.Min(k, n - 1)
j = Application.WorksheetFunction.RandBetween(i, n)
If i < j Then 'swap
temp = numbers(i)
numbers(i) = numbers(j)
numbers(j) = temp
End If
Next i
'now use the first k elements of the partially shuffled array:
For i = 1 To k
v(numbers(i)) = 1
Next i
sample = v
End Function
Used like: Range("A1:J1").Value = sample(10,7)
Using a bit of brute force, I think I've got a workable solution to the original version of the question which asked for random numbers between 0 and 1.
Cells A1 to A9:
=rand()
Cell A10:
=7-sum(A1:A9)
Now you have 10 numbers that add up to 7, but the last one is probably not in the range 0 to 1. To deal with that, just recalculate the sheet to generate new random numbers until that last value is within range. It takes about 25 recalculations to have a ~95% chance that one of them will be within range, so it could take a while. A little VBA can do that for you very quickly:
Sub rand7()
While Range("A10").Value > 1 Or Range("A10").Value < 0
ActiveSheet.Calculate
Wend
End Sub
I have a workbook full of invoices, formatted for printing, and I need to sum the Total Due for all 295 invoices. The first Total Due is in H22 with the next amount 50 cells below in H71 (not including H22.) The rest of Column H contains the individual amounts that compose the Total Due and various text. Like this:
Charges
10
20
30
Amount Due:
$60.00
If it would be easier to sum the range that makes up the total, that works too.
Each range is 10 rows, H11:H20. Then 50 below that at H60:69.
Edit: Cell H22, not H32
Use this array formula:
=SUM(IF(MOD(ROW($H$1:INDEX($H:$H,MATCH(1E+99,$H:$H)))-1,50)=21,$H$1:INDEX($H:$H,MATCH(1E+99,$H:$H))))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when Exiting edit mode. If done correctly then Excel will put {} around the formula.
You can use VBA. Create a new module, and paste in:
Option Explicit
Public Function sum_with_gaps(r As Range, gapSize As Integer, dataSize As Integer) As Double
Dim ret As Double, i%
ret = 0
i = 1
Do While i + dataSize - 1 <= r.Count
ret = ret + WorksheetFunction.Sum(r.Range(r.Cells(i - 1, 1), r.Cells(i + dataSize - 1, 1)))
i = i + dataSize + gapSize
Loop
sum_with_gaps = ret
End Function
Test data:
a
5.1
5.2
5.3
5.4
b
b
6.1
6.2
6.3
6.4
c
c
7.1
7.2
7.3
7.4
d
=sum_with_gaps(A2:A17,2,4)
I have an Excel file with several columns in it and many rows. One column, say A has ID numbers. Another column, say G has prices. Column A has repeating ID numbers, however not all numbers repeat the same amount of times. Sometimes just once, other times 2, 3 or several times. Each column G for that row has a unique price.
Basically, I need to average those prices for a given ID in column A. If each ID was repeated the same number of times, this would be quite simple, but because they are not I have to manually do my average calculation for each grouping. Since my spreadsheet has many many rows, this is taking forever.
Here is an example (column H is the average that I am currently calculating manually):
A ... G H
1 1234 3.00 3.50
2 1234 4.00
3 3456 2.25 3.98
4 3456 4.54
5 3456 5.15
11 8890 0.70 0.95
13 8890 1.20
...
So in the above example, the average price for ID# 1234 would be 3.50. Likewise, the average price for ID# 3456 would be 3.98 and for #8890 would be 0.95.
NOTICE how rows are missing between row 5 and 11, and row 12 is missing too? That is because they are filtered out for some other reason. I need to exclude those hidden rows from my calculations and only calculate the average for the rows visible.
Im trying to write a VBA script that will automatically calculate this, then print that average value for each ID in column H.
Here is some code I have considered:
Sub calcAvg()
Dim rng As Range
Set rng = Range("sheet1!A1:A200003")
For Each Val In rng
Count = 0
V = Val.Value '''V is set equal to the value within the range
If Val.Value = V Then
Sum = Sum + G.Value
V = rng.Offset(1, 0) '''go to next row
Count = Count + 1
Else
'''V = Val.Value '''set value in this cell equal to the value in the next cell down.
avg = Sum / Count
H = avg '''Column G gets the avg value.
End If
Next Val
End Sub
I know there are some problems with the above code. Im not too familiar with VBA. Also this would print the avg on the same line everytime. Im not sure how to iterate the entire row.
This seems overly complicated. Its a simple problem in theory, but the missing rows and differing number of ID# repetitions makes it more complex.
If this can be done in an Excel function, that would be even better.
Any thoughts or suggestions would be greatly appreciated. thanks.
If you can add another row to the top of your data (put column Headers in it) its quite simple with a formula.
Formula for C2 is
=IF(A2<>A1,AVERAGEIFS(B:B,A:A,A2),"")
copy this down for all data rows.
This applies for Excel 2007 or later. If using Excel 2003 or earlier, use AVERAGEIF instead, adjusting ranges accordingly
If you can't add a header row, change the first formula (cell C1) to
=AVERAGEIFS(B:B,A:A,A1)
In my way ..
Sub calcAvg()
Dim x, y, i, y2, t, Count, Mount As Integer
Dim Seek0 As String
x = 1 '--> means Col A
y = 1 '--> means start - Row 1
y2 = 7 '--> means end - Row 19
For i = y To y2
If i = y Then
Seek0 = Cells(i, x)
t = i
Count = Cells(i, x + 6)
Mount = 1
Else
If Cells(i, x) <> Seek0 Then
Cells(t, x + 7) = Count / Mount
Count = Cells(i, x + 6)
Mount = 1
t = i
Seek0 = Cells(i, x)
Else
Count = Count + Cells(i, x + 6)
Mount = Mount + 1
End If
End If
Next
End Sub
Hope this helps ..