How to calculate the average fill rate of a hash table? - hashmap

I'm looking at a hash table implementation and it says that the table will grow by doubling in size when it's 75% full, which gives it an average fill rate of:
(75 + 75 / 2) / 2 = 56%.
How did the author arrive at this formula? If the table tripled in size, would the 2's become 3's?

(75 + 75 / 2) / 2 = 56%.
It's basically saying that when it comes times for resizing, the table will be 75% full (the first 75), but the prior resizing would have happened when the table was half as big, so at that time the number of elements would have been half as much as needed for this resize, so 75 / 2. Outside the parentheses, the trailing / 2 takes the average of these two load factors.
If the table tripled in size, then we'd have:
(75 + 75 / 3) / 2 = 50%.
That reflects the load factor after a resize being only 25% now, but we still have a trailing / 2 to get an average over that initial 25% load factor and the 75% load factor at which it will resize again.

Related

Rank order data

I have the loan dataset below -
Sector
Total Units
Bad units
Bad Rate
Retail Trade
16
5
31%
Construction
500
1100
20%
Healthcare
165
55
33%
Mining
3
2
67%
Utilities
56
19
34%
Other
300
44
15%
How can I create a ranking function to sort this data based on the bad_rate while also accounting for the number of units ?
e.g This is the result when I sort in descending order based on bad_rate
Sector
Total Units
Bad units
Bad Rate
Mining
3
2
67%
Utilities
56
19
34%
Healthcare
165
55
33%
Retail Trade
16
5
31%
Construction
500
1100
20%
Other
300
44
15%
Here, Mining shows up first but I don't really care about this sector as it only has a total of 3 units. I would like construction, other and healthcare to show up on the top as they have more # of total as well as bad units
STEP 1) is easy...
Use SORT("Range","ByColNumber","Order")
Just put it in the top left cell of where you want your sorted data.
=SORT(B3:E8,4,-1):
STEP 2)
Here's the tricky part... you need to decide how to weight the outage.
Here, I found multiplying the Rate% by the Total Unit Rank:
I think this approach gives pretty good results... you just need to play with the formula!
Please let me know what formula you eventually use!
You would need to define sorting criteria, since you don't have a priority based on column, but a combination instead. I would suggest defining a function that weights both columns: Total Units and Bad Rate. Using a weight function would be a good idea, but first, we would need to normalize both columns. For example put the data in a range 0-100, so we can weight each column having similar values. Once you have the data normalized then you can use criteria like this:
w_1 * x + w_2 * y
This is the main idea. Now to put this logic in Excel. We create an additional temporary variable with the previous calculation and name it crit. We Define a user LAMBDA function SORT_BY for calculating crit as follows:
LAMBDA(a,b, wu*a + wbr*b)
and we use MAP to calculate it with the normalized data. For convenience we define another user LAMBDA function to normalize the data: NORM as follows:
LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x)))
Note: The above formula ensures a 0-100 range, but because we are going to use weights maybe it is better to use a 1-100 range, so the weight takes effect for the minimum value too. In such case it can be defined as follow:
LAMBDA(x, ( 100*(x-MIN(x)) + (MAX(x)-x) )/(MAX(x)-MIN(x)))
Here is the formula normalizing for 0-100 range:
=LET(wu, 0.6, wbr, 0.8, u, B2:B7, br, D2:D7, SORT_BY, LAMBDA(a,b, wu*a + wbr*b),
NORM, LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x))),
crit, MAP(NORM(u), NORM(br), LAMBDA(a,b, SORT_BY(a,b))),
DROP(SORT(HSTACK(A2:D7, crit),5,-1),,-1))
You can customize how to weight each column (via wu for Total Units and wbr for Bad Rates columns). Finally, we present the result removing the sorting criteria (crit) via the DROP function. If you want to show it, then remove this step.
If you put the formula in F2 this would be the output:

How to get the total correct percentage in excel using formula

I am trying to get the percentage correct in excel giving the following example. For example, you have 2 errors, and 20 documents. 2/20 is like .05 or %5 were errors. I want how many wasn’t errors which is 95%. How do I get 95% using an equation or formula in excel. I will rate high oh for whoever can answer this!
Of course you can do your simple maths this way:
(20 - 2) / 20 = 18 / 20 which is then for a %
18 / 20 * 100 = 90%
Of course, 2 is 1/10th of 20 and therefore 10%, not 5%. Percent means "per 100" Therefore, to be formally correct, you should multiply both sides of your equation with 10, making for (2*10)/(20*10) = 0.1 = 10%
Since 2/20 = 10%, (20-2)/20 must be 90%. Alternatively, 1-(2/20) also inverts the result.

Altering values as a percentage to work with a graph

I have a gauge graph that goes from 0 to 100.
I have divided out my justification points as how they would show on the 0 - 100 graph. -2STDev, -1STDev, Avg, +1STDev. +2 STDev. How would I go about transferring incoming values to a 0 - 100 scale to match the graph?
On the graph of 0 - 100:
16 represents -2STDev
33 represents -1STDev
50 represents Average
66 represents +1 STDEV
83 represents +2 STDEV
My current values that I want to format to a scale of 100 to fit the graph are:
-2STDev = 63.9
-1STDev = 66.8
AVG = 69.6
+1STDev = 72.5
+2STDev = 75.4
How would I go about creating a formula to adjust these to a 0 - 100 scale? Of course my incoming value, will have to also follow this formula to be graphed upon these.
You can use the following long formula:
=IF(E6<C1,E6/C1*B1,IF(E6<C2,(E6-C1)/(C2-C1)*(B2-B1)+B1,IF(E6<C3,(E6-C2)/(C3-C2)*(B3-B2)+B2,IF(E6<C4,(E6-C3)/(C4-C3)*(B4-B3)+B3,IF(E6<C5,(E6-C4)/(C5-C4)*(B5-B4)+B4,(E6-C5)/(100-C5)*(100-B5)+B5)))))
See the location of the data so you can replace for other if required.
Keep in mind this formula does a linear conversion for values inbetween each one of the values you have.

Excel Formula, To Calcuate a maximum Weight based off a desired minimum profit (GP%)

So I am working on a spreadsheet for a Butchery I manage and have run into a problem.
First off back story: We do $20 packs for certain bulk products that have a min/max weight range.
The Goal is to be able to put in this spreadsheet the desired minimum GP% and from that get a maximum weight based off that minimum profit margin.
For example a Beef Steak that Costs $17.50 p/kilo Would be minimum of 680g (at a GP% of 30.30%) and a maximum weight of 790g (at a GP% of 20.50%)
I have been 'googling' all day, and banging my head on my desk (as well as experimenting with different formula's) I am starting to think I may have to resort to programming a macro to perform this but I would prefer to be able to achieve in a formula on the cell that way I can copy-paste easily down the spreadsheet.
If anyone has a solution or can put me on the right track would be Awesome.
I think the formula you are looking for is :
your selling price (=20$) / your mark up on cost
where your mark up is :
your cost per kilo / (1- your margin)
So for 20% expected GP it gives :
= 20 / (17.5 / (1-0.2))
= 20 / 21.875
= 0.914... kilos
Balance is then :
Revenue = 20$
Cost = 0.914 * 17.5 = 16
Margin = 4
Margin % = 20

Find the minimum number of tanks to hold the maximum quantity of wines, at each tank maximum possible capacity

My business is in the wine reselling business, and we have this problem I've been trying to solve. We have 50 - 70 types of wine to be stored at any time, and around 500 tanks of various capacity. Each tank can only hold 1 type of wine. My job is to determine the minimum number of tanks to hold the maximum number of type of wines, each filled as close to its maximum capacity as possible, i.e 100l of wine should not be stored in a 200l tank if 2 tanks of 60l and 40l also exist.
I've been doing the job by hand in excel and want to try to automate the process, but using macros and array formulas quickly get out of hand. I can write a simple program in C and Swift, but stuck at finding a general algorithm. And pointer on where I can start is much appreciated. A full solution and I will send you a bottle ;)
Edit: for clarification, I do know how many types of wine I have and their total quantity, i.e Pinot at 700l, Merlot 2000l, etc. These change every week. The tanks however have many different capacities (40, 60, 80, 100, 200 liters etc) and change at irregular interval since they have to be taken out for cleaning and replaced. Simply using 70 tanks to hold 70 types is not possible.
Also, total quantity of wine never matches total tanks' capacity, and I need to use the minimum number of tanks to hold the maximum amount of wine. In case of insufficient capacity the amount of wine left over must be smallest possible (they'll spoil quickly). If there is left-over, the amount left over of each type must be proportional to their quantity.
A simplified example of the problem is this:
Wine:
----------
Merlot 100
Pinot 120
Tocai 230
Chardonay 400
Total: 850L
Tanks:
----------
T1 10
T2 20
T3 60
T4 150
T5 80
T6 80
T7 90
T8 80
T9 50
T10 110
T11 50
T12 50
Total: 830L
This greedy-DP algorithm attempts to perform a proportional split: for example, if you have 700l Pinot, 2000l Merlot and tank capacities 40, 60, 80, 100, 200, that means a total capacity of 480.
700 / (700 + 2000) = 0.26
2000 / (700 + 2000) = 0.74
0.26 * 480 = 125
0.74 * 480 = 355
So we will attempt to store 125l of the Pinot and 355l of the Merlot, to make the storage proportional to the amounts we have.
Obviously this isn't fully possible, because you cannot mix wines, but we should be able to get close enough.
To store the Pinot, the closest would be to use tanks 1 (40l) and 3 (80l), then use the rest for the Merlot.
This can be implemented as a subset sum problem:
d[i] = true if we can make sum i and false otherwise
d[0] = true, false otherwise
sum_of_tanks = 0
for each tank i:
sum_of_tanks += tank_capacities[i]
for s = sum_of_tanks down to tank_capacities[i]
d[s] = d[s] OR d[s - tank_capacities[i]]
Compute the proportions then run this for each type of wine you have (removing the tanks already chosen, which you can find by using the d array, I can detail if you want). Look around d[computed_proportion] to find the closest sum possible to achieve for each wine type.
This should be fast enough for a few hundred tanks, which I'm guessing don't have capacities larger than a few thousands.

Resources