representing a number in the octal system - decimal

I am not looking for help with my homework. I just need someone to show me the direction to do it.
I know the answer theoretically. I just stuck with idea of how to prove it mathematically.
here is the question.
Representing a number in the octal system require, on the average, about 10 percent more characters than in the decimal system.
How can I prove this mathematically?

Suppose you wanted to represent a given number x in both systems. In the decimal system, this will take in the order of log10(x) digits. In the octal system, it will take in the order of log8(x) digits.
For any a and b, loga(b) can be written as logc(b)/logc(a) for a given c. In particular, let c=10. Therefore, log8(x) = log10(x)/log10(8) ~= 1.1 log10(x), which means log8(x) is about 1.1 times greater than log10(x) for any given x. Note that this result is exact aside from the rounding. What is not exact is approximating the number of digits by log10(x) and log8(x).

The approximative number of decimal digits required for representing a number is : log10(x), and the number of octal digits is : log8(x)
Which means that the average ratio is log8(x)/log10(x)
As log8(x) = ln(x)/ln(8) and log10(x) = ln(x)/ln(10)
The average ratio is ln(10)/ln(8) = 1.1073...
Of course this is not a 100% exact demonstration, a real demonstration would define exactly the number we are trying to find (such as the average number of digits for numbers between 0 and n when n goes to infinity, etc...) and would compute the exact number of digits (which is an integer) and not an approximation.

Related

Why do VBA and Excel disagree on whether two cells are equal? [duplicate]

This question already has answers here:
VBA rounding problem
(2 answers)
Closed 9 months ago.
I am trying to compare two cells in a table:
The column "MR" is calculated using the formula =ABS([#Value]-A1) to determine the moving range of the column "Value". The values in the "Value" column are not rounded. The highlighted cells in the "MR" column (B3 and B4) are equal. I can enter the formula =B3=B4 into a cell and Excel says that B3 is equal to B4.
But when I compare them in VBA, VBA says that B4 is greater than B3. I can select cell B3 and enter the following into the Immediate Window ? selection.value = selection.offset(1).value. That statement evaluates to false.
I tried removing the absolute value from the formula thinking that might have had something to do with it, but VBA still says they aren't equal.
I tried adding another row where Value=1.78 so MR=0.18. Interestingly, the MR in the new row (B5) is equal to B3, but is not equal to B4.
I then tried increasing the decimal of A4 to match the other values, and now VBA says they are equal. But when I added the absolute value back into the formula, VBA again says they are not equal. I removed the absolute value again and now VBA is saying they are not equal.
Why is VBA telling me the cells are not equal when Excel says they are? How can I reliably handle this situation through VBA going forward?
The problem is that the IEEE 754 Standard for Floating-Point Arithmetic is imprecise by design. Virtually every programming language suffers because of this.
IEEE 754 is an extremely complex topic and when you study it for months and you believe you understand fully, you are simply fooling yourself!
Accurate floating point value comparisons are difficult and error prone. Think long and hard before attempting to compare floating point numbers!
The Excel program gets around the issue by cheating on the application side. VBA on the other hand follows the IEEE 754 spec for Double Precision (binary64) faithfully.
A Double value is represented in memory using 64 bits. These 64 bits are split into three distinct fields that are used in binary scientific notation:
The SIGN bit (1 bit to represent the sign of the value: pos/neg)
The EXPONENT (11 bits, biased in value by +1023)
The MANTISSA (53 bits, 52 bits stored + 1 bit implied)
The mantissa in this system leverages the fact that all binary numbers begin with a digit of 1 and so that 1 is not stored in the bit-pattern. It is implied, increasing the mantissa precision to 53-bits for normal values.
The math works like this: Stored Value = SIGN VALUE * 2^UNBIASED EXPONENT * MANTISSA
Note that a stored value of 1 for the sign bit denotes a negative SIGN VALUE (-1) while a 0 denotes a positive SIGN VALUE (+1). The formula is SIGN VALUE = (-1) ^ (sign bit).
The problem always boils down to the same thing.
The vast majority of real numbers cannot be expressed precisely
within this system which introduces small rounding errors that propagate
like weeds.
It may help to think of this system as a grid of regularly spaced points. The system can represent ONLY the point-values and NONE of the real numbers between the points. All values assigned to a float will be rounded to one of the point-values (usually the closest point, but there are modes that enforce rounding upwards to the next highest point, or rounding downwards). Conducting any calculation on a floating-point value virtually guarantees the resulting value will require rounding.
To accent the obvious, there are an infinite number of real numbers between adjacent representable point-values on this grid; and all of them are rounded to the discreet grid-points.
To make matters worse, the gap size doubles at every Power-of-Two as the grid expands away from true zero (in both directions). For example, the gap length between grid points for values in the range of 2 to 4 is twice as large as it is for values in the range of 1 to 2. When representing values with large enough magnitudes, the grid gap length becomes massive, but closer to true zero, it is miniscule.
With your example numbers...
1.24 is represented with the following binary:
Sign bit = 0
Exponent = 01111111111
Mantissa = 0011110101110000101000111101011100001010001111010111
The Hex pattern over the full 64 bits is precisely: 3FF3D70A3D70A3D7.
The precision is derived exclusively from the 53-bit mantissa and the exact decimal value from the binary is:
0.2399999999999999911182158029987476766109466552734375
In this instance a leading integer of 1 is implied by the hidden bit associated with the mantissa and so the complete decimal value is:
1.2399999999999999911182158029987476766109466552734375
Now notice that this is not precisely 1.24 and that is the entire problem.
Let's examine 1.42:
Sign bit = 0
Exponent = 01111111111
Mantissa = 0110101110000101000111101011100001010001111010111000
The Hex pattern over the full 64 bits is precisely: 3FF6B851EB851EB8.
With the implied 1 the complete decimal value is stored as:
1.4199999999999999289457264239899814128875732421875000
And again, not precisely 1.42.
Now, let's examine 1.6:
Sign bit = 0
Exponent = 01111111111
Mantissa = 1001100110011001100110011001100110011001100110011010
The Hex pattern over the full 64 bits is precisely: 3FF999999999999A.
Notice the repeating binary fraction in this case that is truncated
and rounded when the mantissa bits run out? Obviously 1.6 when
represented in binary base2 can never be precisely accurate in the
same way as 1/3 can never be accurately represented in decimal base10
(0.33333333333333333333333... ≠ 1/3).
With the implied 1 the complete decimal value is stored as:
1.6000000000000000888178419700125232338905334472656250
Not exactly 1.6 but closer than the others!
Now let's subtract the full stored double precision representations:
1.60 - 1.42 = 0.18000000000000015987
1.42 - 1.24 = 0.17999999999999993782
So as you can see, they are not equal at all.
The usual way to work around this is threshold testing, basically an inspection to see if two values are close enough... and that depends on you and your requirements. Be forewarned, effective threshold testing is way harder than it appears at first glance.
Here is a function to help you get started comparing two Double Precision numbers. It handles many situations well but not all because no function can.
Function Roughly(a#, b#, Optional within# = 0.00001) As Boolean
Dim d#, x#, y#, z#
Const TINY# = 1.17549435E-38 'SINGLE_MIN
If a = b Then Roughly = True: Exit Function
x = Abs(a): y = Abs(b): d = Abs(a - b)
If a <> 0# Then
If b <> 0# Then
z = x + y
If z > TINY Then
Roughly = d / z < within
Exit Function
End If
End If
End If
Roughly = d < within * TINY
End Function
The idea here is to have the function return True if the two Doubles are Roughly the same Within a certain margin:
MsgBox Roughly(3.14159, 3.141591) '<---dispays True
The Within margin defaults to 0.00001, but you can pass whatever margin you need.
And while we know that:
MsgBox 1.60 - 1.42 = 1.42 - 1.24 '<---dispays False
Consider the utility of this:
MsgBox Roughly(1.60 - 1.42, 1.42 - 1.24) '<---dispays True
#chris neilsen linked to an interesting Microsoft page about Excel and IEEE 754.
And please read David Goldberg's seminal What Every Computer Scientist Should Know About Floating-Point Arithmetic. It changed the way I understood floating point numbers.

Restrict floats to allotted padding while parsing as string

I would like to print a series of floats with varying amounts of numbers to the left of the decimal place. I would like these numbers to exactly fill a padding with blank spaces, digits, and a decimal point.
Paraphrasing the data and code I have now
floats = [321.1234561, 21.1234561, 1.1234561, 0.123456, 0.02345, 0.0034, 0.0004567]
for number in floats:
print('{:>8.6f}'.format(number))
This outputs
321.123456
21.123456
1.123456
0.123456
0.02345
0.0034
0.000457
I am looking for a way to print the following in a for loop assuming I don't know the amount of digits that will be to the left of the decimal place and the number of digits to the left never exceeds the padding which is 8 for this example.
321.1234
21.12345
1.123456
0.123456
0.02345
0.0034
0.000457
Similar questions have been asked about printing floating points with a certain width but the width they were talking about appeared to be the precision rather than the total number of character used to print the number.
Edit:
I have added a number to the end of the list for the following reason. The use of the specifier 'g' with 7 significant figures was recommended by attdona. This prevents the padding from being exceeded for numbers greater than or equal to 1 but not for numbers less than 1 with precision greater than 6. Using {:>8.7g} instead gives
321.1234
21.12345
1.123456
0.123456
0.02345
0.0034
0.0004567
Where the only one that exceeds the padding is the newly added one.
Use the General format type specifier g:
'{:>8.7g}'.format(number)
reference: https://docs.python.org/3/library/string.html#format-specification-mini-language
Update: For small numbers this format fails to align correctly. In this case you may adopt a mixed approach, but keep in mind that very small numbers will round to zero
for number in floats:
fstr = '{:>8.7g}'.format(number)
if len(fstr) > 8:
fstr = '{:>8.6f}'.format(number)
print(fstr)
for i in floats:
print('{:>8}'.format(f'{i:{8}.{8-len(str(int(i)))-1}f}'.rstrip('0')))
321.1235
21.12346
1.123456
0.123456
0.02345
0.0034

How to extract dyadic fraction from float

Now, floating and double-precision numbers, although they can approximate any sort of number (although the same could be said integers, floats are just more precise), they are represented as binary decimals internally. For example, one tenth would be approximated
0.00011001100110011... (... only goes to computers precision, not infinity)
Now, any number in binary with finite bits as something called a dyadic fraction representation in mathematics (has nothing to do with p-adic). This means you represent it as a fraction, where the denominator is a power of 2. For example, let's say our computer approximates one tenth as 0.00011. The dyadic fraction for that is 3/32 or 3/(2^5), which is close to one tenth. Now for my technical question. What would be the simplest way to extract the dyadic fraction from a floating number.
Irrelevant Note: If you are wondering why I would want to do this, it is because I am creating a surreal number library in Haskell. Dyadic fractions are easily translated into Surreal numbers, which is why it is convenient that binary is easily translated into dyadic, (I'll sure have trouble with the rational numbers though.)
The decodeFloat function seems useful for this. Technically, you should also check that floatRadix is 2, but as far I can see this is always the case in GHC.
Just be careful since it does not simplify mantissa and exponent. Here, if I evaluate decodeFloat (1.0 :: Double) I get an exponent of -52 and a mantissa of 2^52 which is not what I expected.
Also, toRational seems to generate a dyadic fraction. I am not sure this is always the case, though.
Hold your numbers in binary and convert to decimal for display.
Binary numbers are all dyatic. The numbers after the decimal place represent the number of powers of two for the denominator and the number evaluated without a decimal place is the numerator. That's binary numbers for you.
There is an ideal representation for surreal numbers in binary. I call them "sinary". It's this:
0s is Not a number
1s is zero
10s is neg one
11s is one
100s is neg two
101s is neg half
110s is half
111s is two
... etc...
so you see that the standard binary count matches the surreal birth order of numeric values when evaluated in sinary. The way to determine the numeric value of sinary is that the 1's are rights and the 0's are lefts. We start with +/-1's and then 1/2, 1/4, 1/8, etc. With sign equal to + for 1 and - for 0.
ex: evaluating sinary
1011011s
-> is the 91st surreal number (because 64+16+8+2+1 = 91)
-> with a value of −0.28125, because...
1011011
NLRRLRR
+-++-++
+ 0 − 1 + 1/2 + 1/4 − 1/8 + 1/16 + 1/32
= 0 − 32/32 + 16/32 + 8/32 − 4/32 + 2/32 + 1/32
= − 9/32
The surreal numbers form a binary tree, so there is an ideal binary format matching their location on the tree according to the Left/Right pattern to reach the number. Assign 1 to right and 0 to left. Then the birth order of surreal number is equal to the binary count of this representation. ie: the 15th surreal number value represented in sinary is the 15th number representation in the standard binary count. The value of a sinary is the surreal label value. Strip the leading bit from the representation, and start adding +1's or -1's depending on if the number starts with 1 or 0 after the first one. Then once the bit flips, begin adding and subtracting halved values (1/2, 1/4, 1/8, etc) using + or - values according to the bit value 1/0.
I have tested this format and it seems to work well. And there are some other secrets... such as the left and right of any sinary representation is the same binary format with the tail clipped to the last 0 and last 1 respectively. Conversion to decimal into a dyatic is NOT required in order to preform the recursive functions requested by Conway.

Conversion of numeric to string in MATLAB

Suppose I want to conver the number 0.011124325465476454 to string in MATLAB.
If I hit
mat2str(0.011124325465476454,100)
I get 0.011124325465476453 which differs in the last digit.
If I hit num2str(0.011124325465476454,'%5.25f')
I get 0.0111243254654764530000000
which is padded with undesirable zeros and differs in the last digit (3 should be 4).
I need a way to convert numerics with random number of decimals to their EXACT string matches (no zeros padded, no final digit modification).
Is there such as way?
EDIT: Since I din't have in mind the info about precision that Amro and nrz provided, I am adding some more additional info about the problem. The numbers I actually need to convert come from a C++ program that outputs them to a txt file and they are all of the C++ double type. [NOTE: The part that inputs the numbers from the txt file to MATLAB is not coded by me and I'm actually not allowed to modify it to keep the numbers as strings without converting them to numerics. I only have access to this code's "output" which is the numerics I'd like to convert]. So far I haven't gotten numbers with more than 17 decimals (NOTE: consequently the example provided above, with 18 decimals, is not very indicative).
Now, if the number has 15 digits eg 0.280783055069002
then num2str(0.280783055069002,'%5.17f') or mat2str(0.280783055069002,17) returns
0.28078305506900197
which is not the exact number (see last digits).
But if I hit mat2str(0.280783055069002,15) I get
0.280783055069002 which is correct!!!
Probably there a million ways to "code around" the problem (eg create a routine that does the conversion), but isn't there some way using the standard built-in MATLAB's to get desirable results when I input a number with random number of decimals (but no more than 17);
My HPF toolbox also allows you to work with an arbitrary precision of numbers in MATLAB.
In MATLAB, try this:
>> format long g
>> x = 0.280783054
x =
0.280783054
As you can see, MATLAB writes it out with the digits you have posed. But how does MATLAB really "feel" about that number? What does it store internally? See what sprintf says:
>> sprintf('%.60f',x)
ans =
0.280783053999999976380053112734458409249782562255859375000000
And this is what HPF sees, when it tries to extract that number from the double:
>> hpf(x,60)
ans =
0.280783053999999976380053112734458409249782562255859375000000
The fact is, almost all decimal numbers are NOT representable exactly in floating point arithmetic as a double. (0.5 or 0.375 are exceptions to that rule, for obvious reasons.)
However, when stored in a decimal form with 18 digits, we see that HPF did not need to store the number as a binary approximation to the decimal form.
x = hpf('0.280783054',[18 0])
x =
0.280783054
>> x.mantissa
ans =
2 8 0 7 8 3 0 5 4 0 0 0 0 0 0 0 0 0
What niels does not appreciate is that decimal numbers are not stored in decimal form as a double. For example what does 0.1 look like internally?
>> sprintf('%.60f',0.1)
ans =
0.100000000000000005551115123125782702118158340454101562500000
As you see, matlab does not store it as 0.1. In fact, matlab stores 0.1 as a binary number, here in effect...
1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + ...
or if you prefer
2^-4 + 2^-5 + 2^-8 + 2^-9 + 2^-12 + 2^13 + 2^-16 + ...
To represent 0.1 exactly, this would take infinitely many such terms since 0.1 is a repeating number in binary. MATLAB stops at 52 bits. Just like 2/3 = 0.6666666666... as a decimal, 0.1 is stored only as an approximation as a double.
This is why your problem really is completely about precision and the binary form that a double comprises.
As a final edit after chat...
The point is that MATLAB uses a double to represent a number. So it will take in a number with up to 15 decimal digits and be able to spew them out with the proper format setting.
>> format long g
>> eps
ans =
2.22044604925031e-16
So for example...
>> x = 1.23456789012345
x =
1.23456789012345
And we see that MATLAB has gotten it right. But now add one more digit to the end.
>> x = 1.234567890123456
x =
1.23456789012346
In its full glory, look at x, as MATLAB sees it:
>> sprintf('%.60f',x)
ans =
1.234567890123456024298320699017494916915893554687500000000000
So always beware the last digit of any floating point number. MATLAB will try to round things intelligently, but 15 digits is just on the edge of where you are safe.
Is it necessary to use a tool like HPF or MP to solve such a problem? No, as long as you recognize the limitations of a double. However tools that offer arbitrary precision give you the ability to be more flexible when you need it. For example, HPF offers the use and control of guard digits down in that basement area. If you need them, they are there to save the digits you need from corruption.
You can use Multiple Precision Toolkit from MATLAB File Exchange for arbitrary precision numbers. Floating point numbers do not usually have a precise base-10 presentation.
That's because your number is beyond the precision of the double numeric type (it gives you between 15 to 17 significant decimal digits). In your case, it is rounded to the nearest representable number as soon as the literal is evaluated.
If you need more precision than what the double-precision floating-points provides, store the numbers in strings, or use arbitrary-precision libraries. For example use the Symbolic Toolbox:
sym('0.0111243254654764549999999')
You cannot get EXACT string since the number is stored in double type, or even long double type.
The number stored will be a subtle more or less than the number you gives.
computer only knows binary number 0 & 1. You must know that numbers in one radix may not expressed the same in other radix. For example, number 1/3, radix 10 yields 0.33333333...(The ellipsis (three dots) indicate that there would still be more digits to come, here is digit 3), and it will be truncated to 0.333333; radix 3 yields 0.10000000, see, no more or less, exactly the amount; radix 2 yields 0.01010101... , so it will likely truncated to 0.01010101 in computer,that's 85/256, less than 1/3 by rounding, and next time you fetch the number, it won't be the same you want.
So from the beginning, you should store the number in string instead of float type, otherwise it will lose precision.
Considering the precision problem, MATLAB provides symbolic computation to arbitrary precision.

how do you convert a double to a string?

I know that most programming languages have functions built in for doing that for you, but how do those functions work?
The javadoc about the Double toString() method is quite comprehensive:
Creates a string representation of the double argument. All characters mentioned below are ASCII characters.
If the argument is NaN, the result is the string "NaN".
Otherwise, the result is a string that represents the sign and magnitude (absolute value) of the argument. If the sign is negative, the first character of the result is '-' ('-'); if the sign is positive, no sign character appears in the result. As for the magnitude m:
If m is infinity, it is represented by the characters "Infinity"; thus, positive infinity produces the result "Infinity" and negative infinity produces the result "-Infinity".
If m is zero, it is represented by the characters "0.0"; thus, negative zero produces the result "-0.0" and positive zero produces the result "0.0".
If m is greater than or equal to 10^-3 but less than 10^7, then it is represented as the integer part of m, in decimal form with no leading zeroes, followed by '.' (.), followed by one or more decimal digits representing the fractional part of m.
If m is less than 10^-3 or not less than 10^7, then it is represented in so-called "computerized scientific notation." Let n be the unique integer such that 10^n<=m<10^(n+1); then let a be the mathematically exact quotient of m and 10^n so that 1<=a<10. The magnitude is then represented as the integer part of a, as a single decimal digit, followed by '.' (.), followed by decimal digits representing the fractional part of a, followed by the letter 'E' (E), followed by a representation of n as a decimal integer, as produced by the method Integer.toString(int).
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
Is that enough? Otherwise you might like to look up the implementation too...
A simple (but non-generic, naïve and slow way):
convert the number to an integer, then divide this value by 10 stepwise to find out its digits in reverse order. Concatenate them together and you have the integer representation.
substract the integer from the original number, now multiply by 10 stepwise and find the digits after the decimal point. Concatenate the first string with a point and this second string.
This has a few problems, of course:
slow as hell;
doesn't work for negative numbers;
won't give you exponential notation for very small or large numbers.
All in all, it's an idea, but not a very good one; I suspect there are no programming languages that do this.
This paper by Guy Steele provides details on how to do this correctly. It's much more subtle than you might think.
http://portal.acm.org/citation.cfm?id=93559
"Printing Floating-Point Numbers Quickly and Accurately" - Robert G. Burger
Scheme and C code for above.
As Oded mentioned in a comment, different languages will do this in different ways. As an example, here's how Ruby 1.9 does it (in C). Your best bet, just as a research exercise, will be to look into open-source languages and see how they do it.

Resources