OpenJDK's rehashing mechanism

OpenJDK's rehashing mechanism - hashmap

Found this code on http://www.docjar.com/html/api/java/util/HashMap.java.html after searching for a HashMap implementation.
264 static int hash(int h) {
265 // This function ensures that hashCodes that differ only by
266 // constant multiples at each bit position have a bounded
267 // number of collisions (approximately 8 at default load factor).
268 h ^= (h >>> 20) ^ (h >>> 12);
269 return h ^ (h >>> 7) ^ (h >>> 4);
270 }
Can someone shed some light on this? The comment tells us why this code is here but I would like to understand how this improves a bad hash value and how it guarantees that the positions have bounded number of collisions. What do these magic numbers mean?

In order for it to make any sense it has to be combined with an understanding of how HashMap allocates things in to buckets. This is the trivial function by which a bucket index is chosen:
static int indexFor(int h, int length) {
return h & (length-1);
}
So you can see, that with a default table size of 16, only the 4 least significant bits of the hash actually matter for allocating buckets! (16 - 1 = 15, which masks the hash by 1111b)
This could clearly be bad news if your hashCode function returned:
10101100110101010101111010111111
01111100010111011001111010111111
11000000010100000001111010111111
//etc etc etc
Such a hash function would not likely be "bad" in any way that is visible to its author. But if you combine it with the way the map allocates buckets, boom, MapFail(tm).
If you keep in mind that h is a 32 bit number, those are not magic numbers at all. It is systematically xoring the most significant bits of the number rightward into the least significant bits. The purpose is so that "differences" in the number that occur anywhere "across" it when viewed in binary become visible down in the least significant bits.
Collisions become bounded because the number of different numbers that have the same relevant LSBs is now significantly bounded because any differences that occur anywhere in the binary representation are compressed into the bits that matter for bucket-ing.

Related

Why does this Hedgehog generator not shrink faster?

I made a Hedgehog generator that generates arbitrary 256-bit values in the following way:
genWord256 :: Gen Word256
genWord256 = do
bytes <- Gen.integral (Range.linear 0 31)
let lo = 2 ^ (8 * bytes)
hi = 2 ^ (8 * (bytes + 1))
pred <$> Gen.integral (Range.constant lo hi)
Making the size parameter determine the number of bytes in the number, I think, makes sense to my application. However, evaluating how this generator shrinks, and applying ceiling . logBase 2 to this, my question is this:
Why does Hedgehog decide to emphasise on the vicinity of its initial outcome? Have I somehow misunderstood what is meant by "a range which is unaffected by the size parameter"? (Range.constant) I would have thought that any shrinkage here must have a smaller number of bits.
λ> Gen.print genWord256
=== Outcome ===
68126922926972638
=== Shrinks ===
112 -- 7 bits
4035711763 -- 32 bits
106639875637011 -- 47 bits
281474976710655 -- 48 bits
34204198951841647 -- 55 bits
51165560939407143 -- 56 bits
59646241933189891 -- ...
67994412286444783 -- ...
... 50 shrinks omitted ...
68126922926972637 -- 56 bits

The output that you showed makes perfect sense to me.
First and foremost, just to make sure that we are on the same page, what Gen.print shows is not a sequence of consequitive shrinks assuming failure, it is just the first level of the shrink tree.
So, in your example, it generated 68126922926972638, which is a 7-byte value. Assuming that this fails, it will try 112, a 1-byte value. This is as small as possible, given that it corresponds to the value 0 at your first generator. If this test fails too, it will move to the second level of the shrink tree, which we don’t see, but it would be reasonable to assume that it will focus on 1-byte values and try to shrink them towards lo.
However, if the test at 112 succeeds, it will move to the second value on your list, 4035711763, which is a 4-byte value. And note that 4 bytes correspond to 3 in your first generator, which is right in the middle between 0 and 6, because it is doing binary search on the size of your inputs. If the test succeeds again, it will continue moving closer to the original outcome, that is “emphasise on the vicinity of its initial outcome”, and that is what we see in your output.

Selecting parameters for string hashing

I was recently reading an article on string hashing. We can hash a string by converting a string into a polynomial.
H(s1s2s3 ...sn) = (s1 + s2*p + s3*(p^2) + ··· + sn*(p^n−1)) mod M.
What are the constraints on p and M so that the probability of collision decreases?
A good requirement for a hash function on strings is that it should be difficult to find a
pair of different strings, preferably of the same length n, that have equal fingerprints. This
excludes the choice of M < n. Indeed, in this case at some point the powers of p corresponding
to respective symbols of the string start to repeat.
Similarly, if gcd(M, p) > 1 then powers of p modulo M may repeat for
exponents smaller than n. The safest choice is to set p as one of
the generators of the group U(ZM) – the group of all integers
relatively prime to M under multiplication modulo M.
I am not able to understand the above constraints. How selecting M < n and gcd(M,p) > 1 increases collision? Can somebody explain these two with some examples? I just need a basic understanding of these.
In addition, if anyone can focus on upper and lower bounds of M, it will be more than enough.
The above facts has been taken from the following article string hashing mit.

The "correct" answers to these questions involve some amount of number theory, but it can often be instructive to look at some extreme cases to see why the constraints might be useful.
For example, let's look at why we want M ≥ n. As an extreme case, let's pick M = 2 and n = 4. Then look at the numbers p0 mod 2, p1 mod 2, p2 mod 2, and p3 mod 2. Because there are four numbers here and only two possible remainders, by the pigeonhole principle we know that at least two of these numbers must be equal. Let's assume, for simplicity, that p0 and p1 are the same. This means that the hash function will return the same hash code for any two strings whose first two characters have been swapped, since those characters are multiplied by the same amount, which isn't a desirable property of a hash function. More generally, the reason why we want M ≥ n is so that the values p0, p1, ..., pn-1 at least have the possibility of being distinct. If M < n, there will just be too many powers of p for them to all be unique.
Now, let's think about why we want gcd(M, p) = 1. As an extreme case, suppose we pick p such that gcd(M, p) = M (that is, we pick p = M). Then
s0p0 + s1p1 + s2p2 + ... + sn-1pn-1 (mod M)
= s0M0 + s1M1 + s2M2 + ... + sn-1Mn-1 (mod M)
= s0
Oops, that's no good - that makes our hash code exactly equal to the first character of the string. This means that if p isn't coprime with M (that is, if gcd(M, p) ≠ 1), you run the risk of certain characters being "modded out" of the hash code, increasing the collision probability.

How selecting M < n and gcd(M,p) > 1 increases collision?
In your hash function formula, M might reasonably be used to restrict the hash result to a specific bit-width: e.g. M=216 for a 16-bit hash, M=232 for a 32-bit hash, M=2^64 for a 64-bit hash. Usually, a mod/% operation is not actually needed in an implementation, as using the desired size of unsigned integer for the hash calculation inherently performs that function.
I don't recommend it, but sometimes you do see people describing hash functions that are so exclusively coupled to the size of a specific hash table that they mod the results directly to the table size.
The text you quote from says:
A good requirement for a hash function on strings is that it should be difficult to find a pair of different strings, preferably of the same length n, that have equal fingerprints. This excludes the choice of M < n.
This seems a little silly in three separate regards. Firstly, it implies that hashing a long passage of text requires a massively long hash value, when practically it's the number of distinct passages of text you need to hash that's best considered when selecting M.
More specifically, if you have V distinct values to hash with a good general purpose hash function, you'll get dramatically less collisions of the hash values if your hash function produces at least V2 distinct hash values. For example, if you are hashing 1000 values (~210), you want M to be at least 1 million (i.e. at least 2*10 = 20-bit hash values, which is fine to round up to 32-bit but ideally don't settle for 16-bit). Read up on the Birthday Problem for related insights.
Secondly, given n is the number of characters, the number of potential values (i.e. distinct inputs) is the number of distinct values any specific character can take, raised to the power n. The former is likely somewhere from 26 to 256 values, depending on whether the hash supports only letters, or say alphanumeric input, or standard- vs. extended-ASCII and control characters etc., or even more for Unicode. The way "excludes the choice of M < n" implies any relevant linear relationship between M and n is bogus; if anything, it's as M drops below the number of distinct potential input values that it increasingly promotes collisions, but again it's the actual number of distinct inputs that tends to matter much, much more.
Thirdly, "preferably of the same length n" - why's that important? As far as I can see, it's not.
I've nothing to add to templatetypedef's discussion on gcd.

cmm call format for foreign primop (integer-gmp example)

I have been checking out integer-gmp source code to understand how foreign primops can be implemented in terms of cmm as documented on GHC Primops page. I am aware of techniques to implement them using llvm hack or fvia-C/gcc - this is more of a learning experience for me to understand this third approach that interger-gmp library uses.
So, I looked up CMM tutorial on MSFT page (pdf link), went through GHC CMM page, and still there are some unanswered questions (hard to keep all those concepts in head without digging into CMM which is what I am doing now). There is this code fragment from integer-bmp cmm file:
integer_cmm_int2Integerzh (W_ val)
{
W_ s, p; /* to avoid aliasing */
ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val);
p = Hp - SIZEOF_StgArrWords;
SET_HDR(p, stg_ARR_WORDS_info, CCCS);
StgArrWords_bytes(p) = SIZEOF_W;
/* mpz_set_si is inlined here, makes things simpler */
if (%lt(val,0)) {
s = -1;
Hp(0) = -val;
} else {
if (%gt(val,0)) {
s = 1;
Hp(0) = val;
} else {
s = 0;
}
}
/* returns (# size :: Int#,
data :: ByteArray#
#)
*/
return (s,p);
}
As defined in ghc cmm header:
W_ is alias for word.
ALLOC_PRIM_N is a function for allocating memory on the heap for primitive object.
Sp(n) and Hp(n) are defined as below (comments are mine):
#define WDS(n) ((n)*SIZEOF_W) //WDS(n) calculates n*sizeof(Word)
#define Sp(n) W_[Sp + WDS(n)]//Sp(n) points to Stackpointer + n word offset?
#define Hp(n) W_[Hp + WDS(n)]//Hp(n) points to Heap pointer + n word offset?
I don't understand lines 5-9 (line 1 is the start in case you have 1/0 confusion). More specifically:
Why is the function call format of ALLOC_PRIM_N (bytes,fun,arg) that way?
Why is p manipulated that way?
The function as I understand it (from looking at function signature in Prim.hs) takes an int, and returns a (int, byte array) (stored in s,p respectively in the code).
For anyone who is wondering about inline call in if block, it is cmm implementation of gmp mpz_init_si function. My guess is if you call a function defined in object file through ccall, it can't be inlined (which makes sense since it is object-code, not intermediate code - LLVM approach seems more suitable for inlining through LLVM IR). So, the optimization was to define a cmm representation of the function to be inlined. Please correct me if this guess is wrong.
Explanation of lines 5-9 will be very much appreciated. I have more questions about other macros defined in integer-gmp file, but it might be too much to ask in one post. If you can answer the question with a Haskell wiki page or a blog (you can post the link as answer), that would be much appreciated (and if you do, I would also appreciate step-by-step walk-through of an integer-gmp cmm macro such as GMP_TAKE2_RET1).

Those lines allocate a new ByteArray# on the Haskell heap, so to understand them you first need to know a bit about how GHC's heap is managed.
Each capability (= OS thread that executes Haskell code) has its own dedicated nursery, an area of the heap into which it makes normal, small allocations like this one. Objects are simply allocated sequentially into this area from low addresses to high addresses until the capability tries to make an allocation which exceeds the remaining space in the nursery, which triggers the garbage collector.
All heap objects are aligned to a multiple of the word size, i.e., 4 bytes on 32-bit systems and 8 bytes on 64-bit systems.
The Cmm-level register Hp points to (the beginning of) the last word which has been allocated in the nursery. HpLim points to the last word which can be allocated in the nursery. (HpLim can also be set to 0 by another thread to stop the world for GC, or to send an asynchronous exception.)
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects has information on the layout of individual heap objects. Notably each heap object begins with an info pointer, which (among other things) identifies what sort of heap object it is.
The Haskell type ByteArray# is implemented with the heap object type ARR_WORDS. An ARR_WORDS object just consists of (an info pointer followed by) a size (in bytes) followed by arbitrary data (the payload). The payload is not interpreted by the GC, so it can't store pointers to Haskell heap objects, but it can store anything else. SIZEOF_StgArrWords is the size of the header common to all ARR_WORDS heap objects, and in this case the payload is just a single word, so SIZEOF_StgArrWords + WDS(1) is the amount of space we need to allocate.
ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val) expands to something like
Hp = Hp + (SIZEOF_StgArrWords + WDS(1));
if (Hp > HpLim) {
HpAlloc = SIZEOF_StgArrWords + WDS(1);
goto stg_gc_prim_n(integer_cmm_int2Integerzh, val);
}
First line increases Hp by the amount to be allocated. Second line checks for heap overflow. Third line records the amount that we tried to allocate, so the GC can undo it. The fourth line calls the GC.
The fourth line is the most interesting. The arguments tell the GC how to restart the thread once garbage collection is done: it should reinvoke integer_cmm_int2Integerzh with argument val. The "_n" in stg_gc_prim_n (and the "_N" in ALLOC_PRIM_N) means that val is a non-pointer argument (in this case an Int#). If val were a pointer to a Haskell heap object, the GC needs to know that it is live (so it doesn't get collected) and to reinvoke our function with the new address of the object. In that case we'd use the _p variant. There are also variants like _pp for multiple pointer arguments, _d for Double# arguments, etc.
After line 5, we've successfully allocated a block of SIZEOF_StgArrWords + WDS(1) bytes and, remember, Hp points to its last word. So, p = Hp - SIZEOF_StgArrWords sets p to the beginning of this block. Lines 8 fills in the info pointer of p, identifying the newly-created heap object as ARR_WORDS. CCCS is the current cost-center stack, used only for profiling. When profiling is enabled each heap object contains an extra field that basically identifies who is responsible for its allocation. In non-profiling builds, there is no CCCS and SET_HDR just sets the info pointer. Finally, line 9 fills in the size field of the ByteArray#. The rest of the function fills in the payload and return the sign value and the ByteArray# object pointer.
So, this ended up being more about the GHC heap than about the Cmm language, but I hope it helps.

Required knowledge
In order to do arithmetic and logical operations computers have digital circuit called ALU (Arithmetic Logic Unit) in their CPU (Central Processing Unit). An ALU loads data from input registers. Processor register is memory storage in L1 cache (data requests within 3 CPU clock ticks) implemented in SRAM(Static Random-Access Memory) located in CPU chip. A processor often contains several kinds of registers, usually differentiated by the number of bits they can hold.
Numbers are expressed in discrete bits can hold finite number of values. Typically numbers have following primitive types exposed by the programming language (in Haskell):
8 bit numbers = 256 unique representable values
16 bit numbers = 65 536 unique representable values
32 bit numbers = 4 294 967 296 unique representable values
64 bit numbers = 18 446 744 073 709 551 616 unique representable values
Fixed-precision arithmetic for those types has been implemented in hardware. Word size refers to the number of bits that can be processed by a computer's CPU in one go. For x86 architecture this is 32 bits and x64 this is 64 bits.
IEEE 754 defines floating point numbers standard for {16, 32, 64, 128} bit numbers. For example 32 bit point number (with 4 294 967 296 unique values) can hold approximate values [-3.402823e38 to 3.402823e38] with accuracy of at least 7 floating point digits.
In addition
Acronym GMP means GNU Multiple Precision Arithmetic Library and adds support for software emulated arbitrary-precision arithmetic's. Glasgow Haskell Compiler Integer implementation uses this.
GMP aims to be faster than any other bignum library for all operand
sizes. Some important factors in doing this are:
Using full words as the basic arithmetic type.
Using different algorithms for different operand sizes; algorithms that are faster for very big numbers are usually slower for small
numbers.
Highly optimized assembly language code for the most important inner loops, specialized for different processors.
Answer
For some Haskell might have slightly hard to comprehend syntax so here is javascript version
var integer_cmm_int2Integerzh = function(word) {
return WORDSIZE == 32
? goog.math.Integer.fromInt(word))
: goog.math.Integer.fromBits([word.getLowBits(), word.getHighBits()]);
};
Where goog is Google Closure library class used is located in Math.Integer. Called functions :
goog.math.Integer.fromInt = function(value) {
if (-128 <= value && value < 128) {
var cachedObj = goog.math.Integer.IntCache_[value];
if (cachedObj) {
return cachedObj;
}
}
var obj = new goog.math.Integer([value | 0], value < 0 ? -1 : 0);
if (-128 <= value && value < 128) {
goog.math.Integer.IntCache_[value] = obj;
}
return obj;
};
goog.math.Integer.fromBits = function(bits) {
var high = bits[bits.length - 1];
return new goog.math.Integer(bits, high & (1 << 31) ? -1 : 0);
};
That is not totally correct as return type should be return (s,p); where
s is value
p is sign
In order to fix this GMP wrapper should be created. This has been done in Haskell to JavaScript compiler project (source link).
Lines 5-9
ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val);
p = Hp - SIZEOF_StgArrWords;
SET_HDR(p, stg_ARR_WORDS_info, CCCS);
StgArrWords_bytes(p) = SIZEOF_W;
Are as follows
allocates space as new word
creates pointer to it
set pointer value
set pointer type size

Are overlapping sub-arrays of a byte array independent enough to use as hash function(s) for Bloom Filter?

I have the following question in the context of a BloomFilter. BloomFilters need to have k independent hash functions. Let's call these function h1, h2, ... hk. Independent in this context means that their value will have very little correlation (hopefully zero) when applied to the same set. See the Algorithm Description at http://en.wikipedia.org/wiki/Bloom_filter (but of course, you already know that page inside out :).
Now, assume that I want to define my hash functions using some n bits (coming from a crypto function if you must know, but it's not relevant for the question), which are independent from each other themselves. If you want more context you can read http://bitworking.org/news/380/bloom-filter-resources which is doing something similar.
For example, assume I want to define each h as (pardon my pseudo-code):
bytes = MD5(value)
h1 = bytes[0-3] as Integer
h2 = bytes[4-7] as Integer
h3 = bytes[8-11] as Integer
...
Of course we will run out of hash functions very quickly. We only get four in this MD5 example.
One possibility is to let the hash functions overlap with each other and not have the requirement that the four bytes are sequential. That way we has many hash functions as permutations the byte array allows. To keep it simple, what if we defined the hash functions in the following way:
bytes = MD5(value)
h1 = bytes[0-3] as Integer
h2 = bytes[1-4] as Integer
h3 = bytes[2-5] as Integer
...
It is easy to see that in the MD5 case now we have 12 hashing functions instead of four.
Finally, we get to THE question. Are these hashing functions independent? Thanks!
UPDATE: I decided to try to answer the question from a practical point of view so I created a small program that would test the hypothesis. See below.

As is often the case with clever questions, the answer is yes, and no.
Yes, in the sense that there are 16 bits that are not shared between h1 and h2. No, in the senses that are important to you (unless you are only actually using eight bits of the hash function, which I presume you are not).
The issue here is less with dependence between the two functions applied to the same item being inserted and more (in this case, in my opinion) with the functions being applied to multiple items.
Think of it this way. Assume your first example uses g1-g4, and the second uses h1-h4. Two items whose MD5sum (or any other hashing function) overlaps in only 5 consecutive bytes (unlikely, but statistically do-able, especially if you're trying) will stand a chance of colliding if just using h1 and h2, h2 and h3, or h3 and h4. Meanwhile g1-g4 is robust to that possibility.
Now collisions with bloom filters aren't as big a deal as other applications of hash functions, but you should keep in mind that the overlapping bytes do detract from the utility of the hash functions. I'm a little surprised that you need more than four indepdendent hash functions, to be honest.
Also, if you're only using the last 8 bits of each number (256 bit bloom filter) or the last 16 bits (2^16 bit bloom filter), or whatever, then you can 'overlap' the bits that you aren't using with reckless abandon and without risk.
Disclaimer:
I know cryptography pretty well and bloom filters because they are fricking awesome, but my practical knowledge of bloom filters is limited; what you describe may work quite well for your use case.

Running the program below will test the hypothesis with random number generators.
public static void main(String[] args) {
int R = 100, N = 10000, W = 8;
double[] totals = new double[33];
Random r = new Random();
for (int k = 0; k < R; k++) {
// Generate 10,000 random byte arrays
byte[][] bytes = new byte[N][W];
for (int i = 0; i < N; i++) r.nextBytes(bytes[i]);
double[] a1 = new double[N], a2 = new double[N];
for (int i = 0; i <= 32; i++) {
// Extract arrays
for (int j = 0; j < N; j++) {
a1[j] = readInt(bytes[j], 0, 31);
a2[j] = readInt(bytes[j], 32 - i, 31);
}
double c = (new PearsonsCorrelation()).correlation(a1, a2);
totals[i] += c;
}
}
}
The interesting bits is that only when there is only one overlapping bit, the correlation starts to be significant. Below are the pearson correlation coefficients for each number of overlapping bits. We start very low (meaning close to the 0 overlapping case), and get 1 when they fully overlap.
0 -0.001883705757299319
1 -0.0019261826793995395
2 -0.0018466135577488883
3 -0.001499114477250019
4 -0.0010874727770462341
5 -1.1219111699336884E-5
6 -0.001760700583842139
7 3.6545455908216937E-4
8 0.0014823972050436482
9 0.0014809963180788554
10 0.0015226692114697182
11 0.00199027499920776
12 0.001720451344380218
13 -2.0219121772336676E-4
14 6.880004078769847E-4
15 8.605949344202965E-4
16 -0.0025640320027890645
17 -0.002552269654230886
18 -0.002550425130285998
19 -0.002522446787072504
20 -0.00320337678141518
21 -7.554573868921899E-4
22 -6.463448718890875E-4
23 -3.4709181348336335E-4
24 0.0038077518094915912
25 0.0037865326140343815
26 0.0038728464390708982
27 0.0035091958914765407
28 0.005099109955591643
29 0.016993434043779915
30 0.06120260114179265
31 0.25159073855202346
32 1.0
Bottom line: It seems that a shift of one byte (meaning the 24 value above) should be quite safe with respect to hash function generation.

Microsoft.DirectX.Vector3.Normalize() inconsistency

Two ways to normalize a Vector3 object; by calling Vector3.Normalize() and the other by normalizing from scratch:
class Tester {
static Vector3 NormalizeVector(Vector3 v)
{
float l = v.Length();
return new Vector3(v.X / l, v.Y / l, v.Z / l);
}
public static void Main(string[] args)
{
Vector3 v = new Vector3(0.0f, 0.0f, 7.0f);
Vector3 v2 = NormalizeVector(v);
Debug.WriteLine(v2.ToString());
v.Normalize();
Debug.WriteLine(v.ToString());
}
}
The code above produces this:
X: 0
Y: 0
Z: 1
X: 0
Y: 0
Z: 0.9999999
Why?
(Bonus points: Why Me?)

Look how they implemented it (e.g. in asm).
Maybe they wanted to be faster and produced something like:
l = 1 / v.length();
return new Vector3(v.X * l, v.Y * l, v.Z * l);
to trade 2 divisions against 3 multiplications (because they thought mults were faster than divs (which is for modern fpus most often not valid)). This introduced one level more of operation, so the less precision.
This would be the often cited "premature optimization".

Don't care about this. There's always some error involved when using floats. If you're curious, try changing to double and see if this still happens.

You should expect this when using floats, the basic reason being that the computer processes in binary and this doesn't map exactly to decimal.
For an intuitive example of issues between different bases consider the fraction 1/3. It cannot be represented exactly in Decimal (it's 0.333333.....) but can be in Terniary (as 0.1).
Generally these issues are a lot less obvious with doubles, at the expense of computing costs (double the number of bits to manipulate). However in view of the fact that a float level of precision was enough to get man to the moon then you really shouldn't obsess :-)
These issues are sort of computer theory 101 (as opposed to programming 101 - which you're obviously well beyond), and if your heading towards Direct X code where similar things can come up regularly I'd suggest it might be a good idea to pick up a basic computer theory book and read it quickly.

You have here an interesting discussion about String formatting of floats.
Just for reference:
Your number requires 24 bits to be represented, which means that you are using up the whole mantissa of a float (23bits + 1 implied bit).
Single.ToString () is ultimately implemented by a native function, so I cannot tell for sure what is going on, but my guess is that it uses the last digit to round the whole mantissa.
The reason behind this could be that you often get numbers that cannot be represented exactly in binary, so you would get a long mantissa; for instance, 0.01 is represented internally as 0.00999... as you can see by writing:
float f = 0.01f;
Console.WriteLine ("{0:G}", f);
Console.WriteLine ("{0:G}", (double) f);
by rounding at the seventh digit, you will get back "0.01", which is what you would have expected.
For what seen above, numbers with only 7 digits will not show this problem, as you already saw.
Just to be clear: the rounding is taking place only when you convert your number to a string: your calculations, if any, will use all the available bits.
Floats have a precision of 7 digits externally (9 internally), so if you go above that then rounding (with potential quirks) is automatic.
If you drop the float down to 7 digits (for instance, 1 to the left, 6 to the right) then it will work out and the string conversion will as well.
As for the bonus points:
Why you ? Because this code was 'eager to blow on you'.
(Vulcan... blow... ok.
Lamest.
Punt.
Ever)

If your code is broken by minute floating point rounding errors, then I'm afraid you need to fix it, as they're just a fact of life.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string