For load factor, I know it's the total number of elements divided by the space available. For the picture below, at index 2 for example, does it count as 1 spot or 6?
for load factor, I know it's the total number of elements divide by the space available
Yes, the load factor is the total number of entries divided by the number of bins. That's the average number of entries stored in each bin of the HashMap. This number should be kept small in order for the HashMap to have expected constant running time for the get(key) and put(key,value) methods.
at index 2 for example, does it count as 1 spot or 6
Each index represents 1 bin of the HashMap, regardless of how many entries are stored in it.
Therefore, in your example (the image you linked to), you have 10 entries and 5 bins, and the load factor is 2.
Related
I have a graph in the screenshot, and each peak corresponds to one footstep while walking. Thus, I want to count the total number of peaks (which are higher than 4 in this situation).
How can I do this, in Excel, MATLAB or any software?
I have the following table (for example, the real table is 100 rows):
It has a group name column, n students in group, and score.
I would like to build a forth column of cluster, which will divide the groups into 10 decimils of approximatly equal size, while preserving the score. So if i have a total number of 80 students in all groups together, than I'll have 10 clusters which have about 8 students each one more or less. the top cluster will consist the groups with the highest grade.
I hope it makes any sense.
My problem is more an algorithmic one, I prefer to have a solution in excel/vba other than R just because I need a more dinamic solution.
I tried to do it manually by sorting the groups by score, and then summing the n of students until i get a number close to the decimil of the total number of students, but maybe there is an algorithem more precise and less frustrating than that.
Thanks
Example data with desired outcome that I need to calculate
I have 12 items of a certain current value. I have a 'soft' cap of $1,000,000 for these values. Some of the items fall above, and some below this cap level.
I have an amount of money (for this example $900,000) that I want to distribute amongst only the items that fall below the cap (in this example 6 items), with the aim of bringing the value of these items up to but not over the cap value.
If I distribute the $900,000 evenly over these 6 items (each receiving $150,000), you can see that items 2 and 9 would then be over the $1,000,000 cap. So items 2 and 9 should only receive $100,000 to raise their value to the cap, then the remaining 4 items would receive and equal share on the remaining pool of money ($700,000 / 4 = $175,000).
So I need a formula to check every item to see if it needs a distribution (i.e below the cap) and then portion/divide out the money pool as illustrated above in the desired distribution column.
Note: The pool of money to be distributed can change. Also the number of items below the cap can change. The cap value itself can change.
I am hoping to avoid VBA or Solver because the spreadsheet could be used on other people's computers.
Hopefully this makes sense. Thanks.
EDIT:
So far I have been able to get close by adding a helper column and using the following formula:
=IF(SUM($F$6:F14)=$D$23,0,E15*MIN(D15,($D$23-SUM($F$6:F14))/SUM(E15:$E$18)))
Working example when values are sorted.
This seems to work when the values are sorted in descending order, as shown in the example image above. But seems to break when the values are a bit more randomly assorted which is likely to happen (as in the original post).
Just to give you an idea of how the solver can be set up to do a capital budget model here is one, also shows the solver and its settings:
Using PowerPivot and having a cost table, with 300,000 different costtypes, and a calculation table, with about 700,000 records/types, I change the product strings (which can be quite long) to integers, in order to make them shorter and get the RELATED formula to work faster.
With this many records and cost types, would it be better to have all the ID numbers the same length of numbers?
So for example should I start with number 1000000 up to 1500000 or just from 1 to 500000?
Try saving files with 1-500000 and 1000001-1500000 and see the difference in properties. Difference doesn't worth it.
1 to 500,000 is the better option because it is lesser bytes to store. Having the same length has no advantage whatsoever.
You will not notice difference in allocated memory. If you save
1; 2;... or 1000001; 1000002;... or 1 abcdefgh; 2 abcdefgh;... you will find out that:
2.14 Mb for both 1-64000 and 1000001-1064000 in xls format*
3.02 Mb for 1 abcdefgh; 2 abcdefgh;...
584 Kb on disc (much smaller) for both 1-100000 and 1000001-1100000 in .ods format (you cannot save more). There is a small difference (596069Kb vs 597486Kb, but it is negated by cluster size 4 Kb).
From usability - go for 1,000,000 to 1,500,000. You are guaranteed to have same number of digits. Otherwise it is easy to mess up 1234 vs 11234. Strongly consider SQLite or similar database because 0.5 million of rows is pushing the limits of Excel format.
xls format can store a maximum of 65536 rows and 256 columns
1 and 1000000 take the same amount of space because data is not compressed and space sufficient for an int ( number up to 4 billion) is allocated.
I am not sure whether this is the right place to ask this question.
As this is more like a logic question.. but hey no harm in asking.
Suppose I have a huge list of data (customers)
and they all have a data_id
Now I want to select lets say split the data in ratio lets say 10:90 split.
Now rather than stating a condition that (example)
the sum of digits is even...go to bin 1
the sum of digits is odd.. go to bin 2
or sum of last three digits are x then go to bin 1
sum of last three digits is not x then go to bin 2
Now this might result in uneven data collection..sometimes it might be able to find the data.. more (which is fine) but sometimes it might not be able to find enough data
Is there a way (probabilistically speaking)
which says.. sample size is always greater than x%
Thanks
You want to partition your data by a feature that is uniformly distributed. Hash functions are designed to have this property ... so if you compute a hash of your customer ID, and then partition by the first n bits to get 2^n bins, each bin should have approximately the same number of items. (You can then select, say, 90% of your bins to get 90% of the data.) Hope this helps.