Stata seemingly not actually rounding with round() - rounding

Stata has a round() function. One can select the units it rounds to. I want to round an arbitrary floating point value to two decimal places using round(ArbitraryValue, 0.01). Stata's display seems to understand this. But somehow the internal representation of round(ArbitraryValue, 0.01) still has the unrounded floating point value:
. local LevelA = 99.98765432123321
. ttest mpg==20, level(`LevelA') <BR>
level() can have at most two digits after the decimal point <BR>
r(198);
. local LevelB = round(`LevelA',0.01)
. di `LevelB' <BR>
99.99
. ttest mpg==20, level(`LevelB') <BR>
level() must be between 10 and 99.99 inclusive <BR>
r(198);
. set trace on
. ttest mpg==20, level(`LevelB') <BR>
[SNIP]<BR>
= local 0 mpg = 20, level(**99.99000000000001**) <BR>
[SNIP] <BR>
r(198);
What am I not understanding about how to correctly round?

You are being bitten by a basic fact. You want to see exact decimals, but Stata doesn't use exact decimals here; it necessarily calculates in binary. Much ingenuity at several levels hides this from you most of the time, but occasionally it breaks through to the surface.
round() can't possibly find an exact binary representation of 99.99 because there isn't one. The same applies to any multiple or fraction of 0.1(0.1)0.9 except for some multiples or fractions of 0.5.
In that sense, only exceptionally can round() do what you expect, produce an exact multiple of 0.01.
The calculations induced by display are not an exception to this principle; it's just that default display formats usually hide the ugly truth from you.
What you want is in fact a string manipulation, namely display with a specified format such as %3.2f which will guarantee that Stata thinks it is seeing two decimal places.
. sysuse auto, clear
(1978 Automobile Data)
. local LevelA = 99.98765432123321
. local myLevelA : di %3.2f `LevelA'
. ttest mpg == 20, level(`mylevelA')
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
mpg | 74 21.2973 .6725511 5.785503 19.9569 22.63769
------------------------------------------------------------------------------
mean = mean(mpg) t = 1.9289
Ho: mean = 20 degrees of freedom = 73
Ha: mean < 20 Ha: mean != 20 Ha: mean > 20
Pr(T < t) = 0.9712 Pr(|T| > |t|) = 0.0576 Pr(T > t) = 0.0288
search precision to find out more.

If you really need to change the values
local Var " Var1 Var2 " // Variables
local dp = 2 // Number of decimals
foreach Var in `Var' {
gen `Var'_`dp'dp = (floor((`Var'*(10^`dp'))/1))/(10^`dp')
replace `Var'_`dp'dp = `Var'_`dp'dp + 1/(10^`dp') if `Var' - (floor((`Var'*(10^`dp'))/1))/(10^`dp') >= 5/(10^(2+1)) & `Var'!=.
}

Related

Convert Qlikview Hex Date to Normal Date

I exported a XML file from Qlikview and the dates are in this 16-letter/digits form (i.e. 40E5A40D641FDB97). I have tried multiple ways to convert it to floating decimals and then dates but all methods have failed (incl. Excel HEX2DEC).
Anyone has dealt with this issue before? Would greatly appreciate any help!
Here is a Power Query routine that will convert that Hex number into its Date Equivalent:
I generate the binary equivalent of the Hex number using a lookup table and concatenating the results.
The algorithm should be clear in the coding, and it follows the rules set out in IEEE-754.
For the dates you mention in your question, it provides the same results.
Note that this routine assumes a valid value encoded as you describe your date representations from Qlikview. It is not a general purpose routine.
let
//don't really need the Decimal column
hexConvTable = Table.FromRecords({
[Hex="0", Dec=0, Bin = "0000"],
[Hex="1", Dec=1, Bin = "0001"],
[Hex="2", Dec=2, Bin = "0010"],
[Hex="3", Dec=3, Bin = "0011"],
[Hex="4", Dec=4, Bin = "0100"],
[Hex="5", Dec=5, Bin = "0101"],
[Hex="6", Dec=6, Bin = "0110"],
[Hex="7", Dec=7, Bin = "0111"],
[Hex="8", Dec=8, Bin = "1000"],
[Hex="9", Dec=9, Bin = "1001"],
[Hex="A", Dec=10, Bin = "1010"],
[Hex="B", Dec=11, Bin = "1011"],
[Hex="C", Dec=12, Bin = "1100"],
[Hex="D", Dec=13, Bin = "1101"],
[Hex="E", Dec=14, Bin = "1110"],
[Hex="F", Dec=15, Bin = "1111"]},
type table[Hex = Text.Type, Dec = Int64.Type, Bin = Text.Type]),
hexUp = Text.Upper(hexNum),
hexSplit = Table.FromList(Text.ToList(hexUp),Splitter.SplitByNothing(),{"hexNum"}),
//To sort back to original order
addIndex = Table.AddIndexColumn(hexSplit,"Index",0,1,Int64.Type),
//combine with conversion table
binConv = Table.Sort(
Table.Join(
addIndex,"hexNum",hexConvTable,"Hex",JoinKind.LeftOuter),
{"Index", Order.Ascending}),
//equivalent binary
binText = Text.Combine(binConv[Bin]),
sign = Text.Start(binText,1),
//change exponent binary parts to numbers
expBin = List.Transform(Text.ToList(Text.Middle(binText,1,11)),Number.FromText),
//exponent bias will vary depending on the precision being used
expBias = 1023, //Number.Power(2,10-List.PositionOf(expBin,1))-1,
expPwr= List.Reverse({0..10}),
exp = List.Accumulate({0..10},0,(state, current) =>
state + (expBin){current} * Number.Power(2,expPwr{current})) - expBias,
mantBin = List.Transform(Text.ToList(Text.Middle(binText,11,52)),Number.FromText),
mantPwr = {0..51},
mant = List.Accumulate({0..51},0,(state, current) =>
state + (mantBin){current} / Number.Power(2,mantPwr{current})) +1,
dt = mant * Number.Power(2,exp)
in
DateTime.From(dt)
you can use standard windows formatting with Num# (convert text to number) and Num to convert from hex to bin in Qlikview :
# example data from inline table in loading script
[our_hex_numbers]:
LOAD
Num(Num#(hex,'(HEX)'),'(BIN)') as bin
Inline
[hex,
'A',
'B',
'C'];
here is result:
This reference shows how floating point numbers are represented. In double precision (using a total of 64 bits) there is a sign bit, 11-bit exponent and 53-bit significand or mantissa. Observant readers will notice that gives a total of 65 bits: this is because the most significant bit in the mantissa is a hidden bit which by convention is always set to 1 and does not have to be stored.
Taking the first example:
we have
Exponent
The exponent is the first three hexadecimal digits (sign bit plus 11 digits - but the sign bit will always be zero for dates since they are positive numbers). It can be converted using any suitable standard method e.g. in Excel 365:
=LET(L,LEN(A2),seq,SEQUENCE(L),SUM((FIND(MID(A2,seq,1),"0123456789ABCDEF")-1)*16^(L-seq)))
The correct result is obtained by subtracting 1023 (the offset) from the converted value e.g.
40E -> 1038
1038 - 1023 -> 15
So the multiplier is 2^15.
Significand
We need to take the right-hand 13 hexadecimal digits (52 bits) of the string and convert it to a fraction using whatever is your favourite conversion method e.g. in Excel 365:
=LET(L,LEN(A2),seq,SEQUENCE(L),SUM((FIND(MID(A2,seq,1),"0123456789ABCDEF")-1)*16^(-seq)))
Then you need to add 1 (this is the hidden bit which is always set to 1).
Putting this together:
I made a report on QlikView licenses for myself using the file CalData.pgo.xml
and I ran into a non - critical problem of converting hex to date .. but without this transformation, the report would not be complete. ‌‌ (LastUsed, ToBeDeleted)
In general, I was looking for it, but I didn't find anything useful right away, except for converting 13x hex to excel.
But in the file CalData.pgo.xml the date is set in 16 digits, not 13.. I did not understand how to adapt the excel formula for 16 digits, but I realized that it is possible to trim a 16-bit hex to 13 digits. and it seems that nothing significant is lost at the same time .
it works fine for me
=date((num(Num#(right([PerDocumentCalData/NamedCalsAllocated/CalAllocEntry.LastUsed],13),'(HEX)') )*pow(16,-13)+1)*Pow(2,15),'DD.MM.YYYY hh:mm')

Strange result from Summation of numbers in Excel and Matlab [duplicate]

I am writing a program where I need to delete duplicate points stored in a matrix. The problem is that when it comes to check whether those points are in the matrix, MATLAB can't recognize them in the matrix although they exist.
In the following code, intersections function gets the intersection points:
[points(:,1), points(:,2)] = intersections(...
obj.modifiedVGVertices(1,:), obj.modifiedVGVertices(2,:), ...
[vertex1(1) vertex2(1)], [vertex1(2) vertex2(2)]);
The result:
>> points
points =
12.0000 15.0000
33.0000 24.0000
33.0000 24.0000
>> vertex1
vertex1 =
12
15
>> vertex2
vertex2 =
33
24
Two points (vertex1 and vertex2) should be eliminated from the result. It should be done by the below commands:
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
After doing that, we have this unexpected outcome:
>> points
points =
33.0000 24.0000
The outcome should be an empty matrix. As you can see, the first (or second?) pair of [33.0000 24.0000] has been eliminated, but not the second one.
Then I checked these two expressions:
>> points(1) ~= vertex2(1)
ans =
0
>> points(2) ~= vertex2(2)
ans =
1 % <-- It means 24.0000 is not equal to 24.0000?
What is the problem?
More surprisingly, I made a new script that has only these commands:
points = [12.0000 15.0000
33.0000 24.0000
33.0000 24.0000];
vertex1 = [12 ; 15];
vertex2 = [33 ; 24];
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
The result as expected:
>> points
points =
Empty matrix: 0-by-2
The problem you're having relates to how floating-point numbers are represented on a computer. A more detailed discussion of floating-point representations appears towards the end of my answer (The "Floating-point representation" section). The TL;DR version: because computers have finite amounts of memory, numbers can only be represented with finite precision. Thus, the accuracy of floating-point numbers is limited to a certain number of decimal places (about 16 significant digits for double-precision values, the default used in MATLAB).
Actual vs. displayed precision
Now to address the specific example in the question... while 24.0000 and 24.0000 are displayed in the same manner, it turns out that they actually differ by very small decimal amounts in this case. You don't see it because MATLAB only displays 4 significant digits by default, keeping the overall display neat and tidy. If you want to see the full precision, you should either issue the format long command or view a hexadecimal representation of the number:
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793
>> num2hex(pi)
ans =
400921fb54442d18
Initialized values vs. computed values
Since there are only a finite number of values that can be represented for a floating-point number, it's possible for a computation to result in a value that falls between two of these representations. In such a case, the result has to be rounded off to one of them. This introduces a small machine-precision error. This also means that initializing a value directly or by some computation can give slightly different results. For example, the value 0.1 doesn't have an exact floating-point representation (i.e. it gets slightly rounded off), and so you end up with counter-intuitive results like this due to the way round-off errors accumulate:
>> a=sum([0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]); % Sum 10 0.1s
>> b=1; % Initialize to 1
>> a == b
ans =
logical
0 % They are unequal!
>> num2hex(a) % Let's check their hex representation to confirm
ans =
3fefffffffffffff
>> num2hex(b)
ans =
3ff0000000000000
How to correctly handle floating-point comparisons
Since floating-point values can differ by very small amounts, any comparisons should be done by checking that the values are within some range (i.e. tolerance) of one another, as opposed to exactly equal to each other. For example:
a = 24;
b = 24.000001;
tolerance = 0.001;
if abs(a-b) < tolerance, disp('Equal!'); end
will display "Equal!".
You could then change your code to something like:
points = points((abs(points(:,1)-vertex1(1)) > tolerance) | ...
(abs(points(:,2)-vertex1(2)) > tolerance),:)
Floating-point representation
A good overview of floating-point numbers (and specifically the IEEE 754 standard for floating-point arithmetic) is What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
A binary floating-point number is actually represented by three integers: a sign bit s, a significand (or coefficient/fraction) b, and an exponent e. For double-precision floating-point format, each number is represented by 64 bits laid out in memory as follows:
The real value can then be found with the following formula:
This format allows for number representations in the range 10^-308 to 10^308. For MATLAB you can get these limits from realmin and realmax:
>> realmin
ans =
2.225073858507201e-308
>> realmax
ans =
1.797693134862316e+308
Since there are a finite number of bits used to represent a floating-point number, there are only so many finite numbers that can be represented within the above given range. Computations will often result in a value that doesn't exactly match one of these finite representations, so the values must be rounded off. These machine-precision errors make themselves evident in different ways, as discussed in the above examples.
In order to better understand these round-off errors it's useful to look at the relative floating-point accuracy provided by the function eps, which quantifies the distance from a given number to the next largest floating-point representation:
>> eps(1)
ans =
2.220446049250313e-16
>> eps(1000)
ans =
1.136868377216160e-13
Notice that the precision is relative to the size of a given number being represented; larger numbers will have larger distances between floating-point representations, and will thus have fewer digits of precision following the decimal point. This can be an important consideration with some calculations. Consider the following example:
>> format long % Display full precision
>> x = rand(1, 10); % Get 10 random values between 0 and 1
>> a = mean(x) % Take the mean
a =
0.587307428244141
>> b = mean(x+10000)-10000 % Take the mean at a different scale, then shift back
b =
0.587307428244458
Note that when we shift the values of x from the range [0 1] to the range [10000 10001], compute a mean, then subtract the mean offset for comparison, we get a value that differs for the last 3 significant digits. This illustrates how an offset or scaling of data can change the accuracy of calculations performed on it, which is something that has to be accounted for with certain problems.
Look at this article: The Perils of Floating Point. Though its examples are in FORTRAN it has sense for virtually any modern programming language, including MATLAB. Your problem (and solution for it) is described in "Safe Comparisons" section.
type
format long g
This command will show the FULL value of the number. It's likely to be something like 24.00000021321 != 24.00000123124
Try writing
0.1 + 0.1 + 0.1 == 0.3.
Warning: You might be surprised about the result!
Maybe the two numbers are really 24.0 and 24.000000001 but you're not seeing all the decimal places.
Check out the Matlab EPS function.
Matlab uses floating point math up to 16 digits of precision (only 5 are displayed).

Why is a whole number not the same when rounded in a custom function?

I have the following custom function that rounds a number to a user-specified accuracy.
It is based on the general formula:
ROUND(Value/ Accuracy,0)*Accuracy
There are times where Number/Accuracy is exactly a multiple of 0.5, and Excel does not do the common rounding rule (ODD number - Round up, EVEN number - Round down), so I made a custom function.
Function CheckTemp(val As Range, NumAccuracy As Range) As Double
Dim Temp As Double
Temp= Abs(val) / NumAccuracy
CheckTemp = (Temp / 0.5) - WorksheetFunction.RoundDown(Temp / 0.5 , 0)
End Function
If CheckTemp = 0, then 'val' falls under this case where depending on the number, I want to specifically round down or up. If it is false, then the general Round() command is used.
I do have a weird case when Accuracy = 0.1 and any 'val' that meets the requirement:
#.X5000000...,
where: 'X' is an ODD number, or zero (i.e. 0,1,3,5,7,9).
Depending on the whole number, the function does not work.
Example:
val = - 5 361 202.55
NumAccuracy = 0.1
Temp = 53 612 025.5
Temp / 0.5 = 107 224 051.
WorksheetFunction.RoundDown(Temp / 0.5,0) = 107 224 051.
CheckTemp = -1.49012E-08
If I break this check into two separate functions, one to output (Temp/0.5) and WF.RoundDown(Temp / 0.5) to the Excel worksheet, and then subtract the two in the worksheet I get EXACTLY 0.
However with VBA coding, an error comes into play and results in a non-zero answer (even more worrisome a NEGATIVE value, which should be impossible when Temp is always positive, and RoundDown('x','y') will always result in a smaller number than 'x').
'val' can be a very large number with many decimal places, so I am trying to keep the 'Double' parameter if possible.
I tried 'Single' variable type and it seems to remove the error with CheckTemp(), but I am worried an end-user may use a number that exceeds the 'Single' variable limit.
You are not wrong, but native rounding in VBA is severely limited.
So, use a proper rounding function like RoundMid as found in my project VBA.Round. It uses Decimal if possible to avoid such errors.
Example:
Value = 5361202.55
NumAccuracy = 0.1
RoundedValue = RoundMid(Value / NumAccuracy, 0) * Numaccuracy
RoundedValue -> 5361202.6

Trying to end up with two decimal points on a float, but keep getting 0.0

I have a float and would like to limit to just two decimals.
I've tried format(), and round(), and still just get 0, or 0.0
x = 8.972990688205408e-05
print ("x: ", x)
print ("x using round():", round(x))
print ("x using format():"+"{:.2f}".format(x))
output:
x: 8.972990688205408e-05
x using round(): 0
x using format():0.00
I'm expecting 8.98, or 8.97 depending on what method used. What am I missing?
You are using the scientific notation. As glhr pointed out in the comments, you are trying to round 8.972990688205408e-05 = 0.00008972990688205408. This means trying to round as type float will only print the first two 0s after the decimal points, resulting in 0.00. You will have to format via 0:.2e:
x = 8.972990688205408e-05
print("{0:.2e}".format(x))
This prints:
8.97e-05
You asked in one of your comments on how to get only the 8.97.
This is the way to do it:
y = x*1e+05
print("{0:.2f}".format(y))
output:
8.97
In python (and many other programming language), any number suffix with an e with a number, it is power of 10 with the number.
For example
8.9729e05 = 8.9729 x 10^3 = 8972.9
8.9729e-05 = 8.9729 x 10^-3 = 0.000089729
8.9729e0 = 8.9729 x 10^0 = 8.9729
8.972990688205408e-05 8.972990688205408 x 10^-5 = 0.00008972990688205408
8.9729e # invalid syntax
As pointed out by other answer, if you want to print out the exponential round up, you need to use the correct Python string format, you have many choices to choose from. i.e.
e Floating point exponential format (lowercase, precision default to 6 digit)
e Floating point exponential format (uppercase, precision default to 6 digit).
g Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise
G Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise
e.g.
x = 8.972990688205408e-05
print('{:e}'.format(x)) # 8.972991e-05
print('{:E}'.format(x)) # 8.972991E-05
print('{:.2e}'.format(x)) # 8.97e-05
(Update)
OP asked a way to remove the exponent "E" number. Since str.format() or "%" notation just output a string object, break the "e" notation out of the string will do the trick.
'{:.2e}'.format(x).split("e") # ['8.97', '-05']
print('{:.2e}'.format(x).split('e')[0]) # 8.97
If I understand correctly, you only want to round the mantissa/significand? If you want to keep x as a float and output a float, just specify the precision when calling round:
x = round(8.972990688205408e-05,7)
Output:
8.97e-05
However, I recommend converting x with the decimal module first, which "provides support for fast correctly-rounded decimal floating point arithmetic" (see this answer):
from decimal import Decimal
x = Decimal('8.972990688205408e-05').quantize(Decimal('1e-7')) # output: 0.0000897
print('%.2E' % x)
Output:
8.97E-05
Or use the short form of the format method, which gives the same output:
print(f"{x:.2E}")
rount() returns closest multiple of 10 to the power minus ndigits,
so there is no chance you will get 8.98 or 8.97. you can check here also.

Floating point addition with LSB error

I'm implementing a hardware double precision adder with Verilog. During the verification phase when I compare my hardware output to MATLAB (or C) double precision addition outputs I found some weird cases where the LSB is not matching, taking into account that I'm using the same rounding mode (round to nearest even). My question is about the accuracy of the C calculation, is it truly accurate in doing the rounding or it's limited to some CPU architecture (32 or 64 bits)?
Here's an example,
A = 0x62a5a1c59bd10037 = 1.5944933396238637e+167
B = 0x62724bc40659bf0c = 1.685748657333889e+166 = 0.1685748657333889e+167
The correct output (just by doing the addition of the above real numbers manually)
= 1.7630682053572526e+167 = 0x62a7eb3e1c9c3819 (this matches my hardware)
When I try doing A+B in C, the result is equal to
= 1.7630682053572525e+167 = 0x62a7eb3e1c9c3818
When I try this application to check the intermediate operations
http://www.ecs.umass.edu/ece/koren/arith/simulator/FPAdd/
I can see from mantissa addition that C is not doing the rounding correctly (round to nearest even). In this case the mantissa should be rounded by adding one. Any idea why this is happening?
The operation of http://www.ecs.umass.edu/ece/koren/arith/simulator/FPAdd/ is correct. The last round to nearest even peforms a downward rounding:
A+B + 1.0111111010110011111000011100100111000011100000011000|10 *2^555
^
|
to forget the |10 part (exactly in the middle), the result chooses 0 (even) instead of 1

Resources