HEX2OCT formula in MS Excel returns incorrect result - excel

While converting the hexadecimal value "FFFFFFFF00" into octal value using Hex2Oct of MS Excel, it should return "Error string" as per the rules mentioned here:
If number is negative, HEX2OCT ignores places and returns a 10-character octal number.
If number is negative, it cannot be less than FFE0000000, and if number is positive, it cannot be greater than 1FFFFFFF.
If number is not a valid hexadecimal number, HEX2OCT returns the #NUM! error value.
If HEX2OCT requires more than places characters, it returns the #NUM! error value.
If places is not an integer, it is truncated.
If places is nonnumeric, HEX2OCT returns the #VALUE! error value.
If places is negative, HEX2OCT returns the #NUM! error value.
But it computes and returns as "7777777400" without considering the rules/remarks mentioned in the link.
For example:
While calculating HEX2OCT,
As per Excel rule, If number is positive, it cannot be greater than 1FFFFFFF(hex)<->3777777777(oct)<->536870911(decimal).
But while calculating the HEX2OCT for FFFFFFFF00(hex) <-> 7777777400(oct) <-> 1099511627520(decimal).
Here the hex value FFFFFFFF00 is greater than 1FFFFFFF, but MS Excel does not return the error string instead it returns the converted octal value.
Can anyone explain why?

FFFFFFFF00 is actually well within the range of hex2oct because it is a negative number.
According to that documentation the largest negative number it can handle is FFE0000000 which when converted to decimal is -536870912. Converting your "big" hex over to decimal yields -256.
The reason the value of FFFFFFFF00 looks so big is because it's a negative number. The first bit is set to 1 (when converted to binary) which signifies that the number is negative. Negatives are computed in binary using two's complement which is found by flipping each bit and then adding 1 to the number.
Undoing the two's complement:
For your big number, the binary representation is:
1111111111111111111111111111111100000000
Subtracting 1:
1111111111111111111111111111111011111111
Flipping all the bits:
0000000000000000000000000000000100000000
Which is 256
So.. basically if the hex looks big, but the first bit is 1 then it's actually a small negative and well within your range of allowable values.
Lastly, when you hex2oct you don't get a negative sign for these because we are still not in decimal notation. The first bit of your octal is still a 1 (when converted to binary) since it's still the same number, just represented in a different counting system.

The clue lies earlier in the documentation page you quote:
The HEX2OCT function syntax has the following arguments:
Number Required. The hexadecimal number you want to convert. Number cannot contain more than 10 characters. The most significant
bit of number is the sign bit. The remaining 39 bits are magnitude
bits. Negative numbers are represented using two's-complement notation.
The hex value FFFFFFFF00 corresponds the binary value
1111 1111 1111 1111 1111 1111 1111 1111 0000 0000
and as the documentation says, "the most significant bit is the sign bit ... two's complement notation". So this value represents a negative number. By the rules of two's complement, it actually represents -256. And this is fine, because it is not "less than FFE0000000", as FFE0000000 is -2097152.
If you actually want to treat FFFFFFFF00 as an unsigned quantity, and get the octal representation of decimal 1099511627520, you'll need to use another method.

Related

Why do VBA and Excel disagree on whether two cells are equal? [duplicate]

This question already has answers here:
VBA rounding problem
(2 answers)
Closed 9 months ago.
I am trying to compare two cells in a table:
The column "MR" is calculated using the formula =ABS([#Value]-A1) to determine the moving range of the column "Value". The values in the "Value" column are not rounded. The highlighted cells in the "MR" column (B3 and B4) are equal. I can enter the formula =B3=B4 into a cell and Excel says that B3 is equal to B4.
But when I compare them in VBA, VBA says that B4 is greater than B3. I can select cell B3 and enter the following into the Immediate Window ? selection.value = selection.offset(1).value. That statement evaluates to false.
I tried removing the absolute value from the formula thinking that might have had something to do with it, but VBA still says they aren't equal.
I tried adding another row where Value=1.78 so MR=0.18. Interestingly, the MR in the new row (B5) is equal to B3, but is not equal to B4.
I then tried increasing the decimal of A4 to match the other values, and now VBA says they are equal. But when I added the absolute value back into the formula, VBA again says they are not equal. I removed the absolute value again and now VBA is saying they are not equal.
Why is VBA telling me the cells are not equal when Excel says they are? How can I reliably handle this situation through VBA going forward?
The problem is that the IEEE 754 Standard for Floating-Point Arithmetic is imprecise by design. Virtually every programming language suffers because of this.
IEEE 754 is an extremely complex topic and when you study it for months and you believe you understand fully, you are simply fooling yourself!
Accurate floating point value comparisons are difficult and error prone. Think long and hard before attempting to compare floating point numbers!
The Excel program gets around the issue by cheating on the application side. VBA on the other hand follows the IEEE 754 spec for Double Precision (binary64) faithfully.
A Double value is represented in memory using 64 bits. These 64 bits are split into three distinct fields that are used in binary scientific notation:
The SIGN bit (1 bit to represent the sign of the value: pos/neg)
The EXPONENT (11 bits, biased in value by +1023)
The MANTISSA (53 bits, 52 bits stored + 1 bit implied)
The mantissa in this system leverages the fact that all binary numbers begin with a digit of 1 and so that 1 is not stored in the bit-pattern. It is implied, increasing the mantissa precision to 53-bits for normal values.
The math works like this: Stored Value = SIGN VALUE * 2^UNBIASED EXPONENT * MANTISSA
Note that a stored value of 1 for the sign bit denotes a negative SIGN VALUE (-1) while a 0 denotes a positive SIGN VALUE (+1). The formula is SIGN VALUE = (-1) ^ (sign bit).
The problem always boils down to the same thing.
The vast majority of real numbers cannot be expressed precisely
within this system which introduces small rounding errors that propagate
like weeds.
It may help to think of this system as a grid of regularly spaced points. The system can represent ONLY the point-values and NONE of the real numbers between the points. All values assigned to a float will be rounded to one of the point-values (usually the closest point, but there are modes that enforce rounding upwards to the next highest point, or rounding downwards). Conducting any calculation on a floating-point value virtually guarantees the resulting value will require rounding.
To accent the obvious, there are an infinite number of real numbers between adjacent representable point-values on this grid; and all of them are rounded to the discreet grid-points.
To make matters worse, the gap size doubles at every Power-of-Two as the grid expands away from true zero (in both directions). For example, the gap length between grid points for values in the range of 2 to 4 is twice as large as it is for values in the range of 1 to 2. When representing values with large enough magnitudes, the grid gap length becomes massive, but closer to true zero, it is miniscule.
With your example numbers...
1.24 is represented with the following binary:
Sign bit = 0
Exponent = 01111111111
Mantissa = 0011110101110000101000111101011100001010001111010111
The Hex pattern over the full 64 bits is precisely: 3FF3D70A3D70A3D7.
The precision is derived exclusively from the 53-bit mantissa and the exact decimal value from the binary is:
0.2399999999999999911182158029987476766109466552734375
In this instance a leading integer of 1 is implied by the hidden bit associated with the mantissa and so the complete decimal value is:
1.2399999999999999911182158029987476766109466552734375
Now notice that this is not precisely 1.24 and that is the entire problem.
Let's examine 1.42:
Sign bit = 0
Exponent = 01111111111
Mantissa = 0110101110000101000111101011100001010001111010111000
The Hex pattern over the full 64 bits is precisely: 3FF6B851EB851EB8.
With the implied 1 the complete decimal value is stored as:
1.4199999999999999289457264239899814128875732421875000
And again, not precisely 1.42.
Now, let's examine 1.6:
Sign bit = 0
Exponent = 01111111111
Mantissa = 1001100110011001100110011001100110011001100110011010
The Hex pattern over the full 64 bits is precisely: 3FF999999999999A.
Notice the repeating binary fraction in this case that is truncated
and rounded when the mantissa bits run out? Obviously 1.6 when
represented in binary base2 can never be precisely accurate in the
same way as 1/3 can never be accurately represented in decimal base10
(0.33333333333333333333333... ≠ 1/3).
With the implied 1 the complete decimal value is stored as:
1.6000000000000000888178419700125232338905334472656250
Not exactly 1.6 but closer than the others!
Now let's subtract the full stored double precision representations:
1.60 - 1.42 = 0.18000000000000015987
1.42 - 1.24 = 0.17999999999999993782
So as you can see, they are not equal at all.
The usual way to work around this is threshold testing, basically an inspection to see if two values are close enough... and that depends on you and your requirements. Be forewarned, effective threshold testing is way harder than it appears at first glance.
Here is a function to help you get started comparing two Double Precision numbers. It handles many situations well but not all because no function can.
Function Roughly(a#, b#, Optional within# = 0.00001) As Boolean
Dim d#, x#, y#, z#
Const TINY# = 1.17549435E-38 'SINGLE_MIN
If a = b Then Roughly = True: Exit Function
x = Abs(a): y = Abs(b): d = Abs(a - b)
If a <> 0# Then
If b <> 0# Then
z = x + y
If z > TINY Then
Roughly = d / z < within
Exit Function
End If
End If
End If
Roughly = d < within * TINY
End Function
The idea here is to have the function return True if the two Doubles are Roughly the same Within a certain margin:
MsgBox Roughly(3.14159, 3.141591) '<---dispays True
The Within margin defaults to 0.00001, but you can pass whatever margin you need.
And while we know that:
MsgBox 1.60 - 1.42 = 1.42 - 1.24 '<---dispays False
Consider the utility of this:
MsgBox Roughly(1.60 - 1.42, 1.42 - 1.24) '<---dispays True
#chris neilsen linked to an interesting Microsoft page about Excel and IEEE 754.
And please read David Goldberg's seminal What Every Computer Scientist Should Know About Floating-Point Arithmetic. It changed the way I understood floating point numbers.

Restrict floats to allotted padding while parsing as string

I would like to print a series of floats with varying amounts of numbers to the left of the decimal place. I would like these numbers to exactly fill a padding with blank spaces, digits, and a decimal point.
Paraphrasing the data and code I have now
floats = [321.1234561, 21.1234561, 1.1234561, 0.123456, 0.02345, 0.0034, 0.0004567]
for number in floats:
print('{:>8.6f}'.format(number))
This outputs
321.123456
21.123456
1.123456
0.123456
0.02345
0.0034
0.000457
I am looking for a way to print the following in a for loop assuming I don't know the amount of digits that will be to the left of the decimal place and the number of digits to the left never exceeds the padding which is 8 for this example.
321.1234
21.12345
1.123456
0.123456
0.02345
0.0034
0.000457
Similar questions have been asked about printing floating points with a certain width but the width they were talking about appeared to be the precision rather than the total number of character used to print the number.
Edit:
I have added a number to the end of the list for the following reason. The use of the specifier 'g' with 7 significant figures was recommended by attdona. This prevents the padding from being exceeded for numbers greater than or equal to 1 but not for numbers less than 1 with precision greater than 6. Using {:>8.7g} instead gives
321.1234
21.12345
1.123456
0.123456
0.02345
0.0034
0.0004567
Where the only one that exceeds the padding is the newly added one.
Use the General format type specifier g:
'{:>8.7g}'.format(number)
reference: https://docs.python.org/3/library/string.html#format-specification-mini-language
Update: For small numbers this format fails to align correctly. In this case you may adopt a mixed approach, but keep in mind that very small numbers will round to zero
for number in floats:
fstr = '{:>8.7g}'.format(number)
if len(fstr) > 8:
fstr = '{:>8.6f}'.format(number)
print(fstr)
for i in floats:
print('{:>8}'.format(f'{i:{8}.{8-len(str(int(i)))-1}f}'.rstrip('0')))
321.1235
21.12346
1.123456
0.123456
0.02345
0.0034

How to obtain the hex value 0xffff corresponding to the decimal value -0.000061 in the table below?

Right at the beginning of this page The OpenType Font File you'll find this table, with examples of the F2DOT14 format for a 16-bit signed fixed number with the low 14 bits of a fraction.
I couldn't obtain the hex value 0xffff for the decimal -0.000061. By the way the mantissa -1 seems to be wrong and the value for the fraction should be 1/16384, instead of 16383/16384, unless I'm missing something related to the two's complement notation used to express a negative value in code.
The mantissa and fraction values listed are entirely correct: the F2DOT14 field encodes numbers as the arithmetic computation mantissa + fraction, not as "signed mantissa with unsigned concatenated fraction remainder".
As such, if you want -0.000061, you have to start with the signed integer -1 in the first two bits (11) and then add the positive value 16383/16384 in the last 14 bits (11111111111111), such that mantissa + fraction = -1 + 16383/16384 = -1/16384, which in turn is encoded using the 16 bit code 0xFFFF

What does $ with a numeric value mean in Delphi

What does it means, in Delphi, when I see a command like this:
char($23)
What does the dollar symbol mean in this context?
The dollar symbol represents that the following is a hex value.
ShowMessage(Char($23)); shows #.
The $ symbol is used to prefix a hexadecimal literal. The documentation says:
Numerals
Integer and real constants can be represented in decimal notation as
sequences of digits without commas or spaces, and prefixed with the +
or - operator to indicate sign. Values default to positive (so that,
for example, 67258 is equivalent to +67258) and must be within the
range of the largest predefined real or integer type.
Numerals with decimal points or exponents denote reals, while other
numerals denote integers. When the character E or e occurs within a
real, it means "times ten to the power of". For example, 7E2 means 7 *
10^2, and 12.25e+6 and 12.25e6 both mean 12.25 * 10^6.
The dollar-sign prefix indicates a hexadecimal numeral, for example,
$8F. Hexadecimal numbers without a preceding - unary operator are
taken to be positive values. During an assignment, if a hexadecimal
value lies outside the range of the receiving type an error is raised,
except in the case of the Integer (32-bit integer) where a warning
is raised. In this case, values exceeding the positive range for
Integer are taken to be negative numbers in a manner consistent with two's complement integer representation.
So, in your example, $23 is the number whose hexadecimal representation is 23. That number has decimal representation 35, so you can write:
Assert($23 = 35);
It represents a character. For example char(13) is end of line.

how do you convert a double to a string?

I know that most programming languages have functions built in for doing that for you, but how do those functions work?
The javadoc about the Double toString() method is quite comprehensive:
Creates a string representation of the double argument. All characters mentioned below are ASCII characters.
If the argument is NaN, the result is the string "NaN".
Otherwise, the result is a string that represents the sign and magnitude (absolute value) of the argument. If the sign is negative, the first character of the result is '-' ('-'); if the sign is positive, no sign character appears in the result. As for the magnitude m:
If m is infinity, it is represented by the characters "Infinity"; thus, positive infinity produces the result "Infinity" and negative infinity produces the result "-Infinity".
If m is zero, it is represented by the characters "0.0"; thus, negative zero produces the result "-0.0" and positive zero produces the result "0.0".
If m is greater than or equal to 10^-3 but less than 10^7, then it is represented as the integer part of m, in decimal form with no leading zeroes, followed by '.' (.), followed by one or more decimal digits representing the fractional part of m.
If m is less than 10^-3 or not less than 10^7, then it is represented in so-called "computerized scientific notation." Let n be the unique integer such that 10^n<=m<10^(n+1); then let a be the mathematically exact quotient of m and 10^n so that 1<=a<10. The magnitude is then represented as the integer part of a, as a single decimal digit, followed by '.' (.), followed by decimal digits representing the fractional part of a, followed by the letter 'E' (E), followed by a representation of n as a decimal integer, as produced by the method Integer.toString(int).
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
Is that enough? Otherwise you might like to look up the implementation too...
A simple (but non-generic, naïve and slow way):
convert the number to an integer, then divide this value by 10 stepwise to find out its digits in reverse order. Concatenate them together and you have the integer representation.
substract the integer from the original number, now multiply by 10 stepwise and find the digits after the decimal point. Concatenate the first string with a point and this second string.
This has a few problems, of course:
slow as hell;
doesn't work for negative numbers;
won't give you exponential notation for very small or large numbers.
All in all, it's an idea, but not a very good one; I suspect there are no programming languages that do this.
This paper by Guy Steele provides details on how to do this correctly. It's much more subtle than you might think.
http://portal.acm.org/citation.cfm?id=93559
"Printing Floating-Point Numbers Quickly and Accurately" - Robert G. Burger
Scheme and C code for above.
As Oded mentioned in a comment, different languages will do this in different ways. As an example, here's how Ruby 1.9 does it (in C). Your best bet, just as a research exercise, will be to look into open-source languages and see how they do it.

Resources