I saw this piece of J code
(**+)&.+.
in RosettaCode FFT section. It is supposed to clean up insignificant digits of FFT result. For example
(**+)&.+. 4e_16j2
gives
0j2
It is similar to Chop function of Mathematica for example.
However
(**+)&.+. _4j_2
gives
4j2 (instead of _4j_2)
which is obviously incorrect.
The question is what is the correct way in J to chop off insignificant digits?
The monad + (as opposed to the dyad +) is "complex conjugate", which is the culprit in producing 4j2 as opposed to _4j_2.
The editor responsible for (**+)&.+. on RosettaCode probably intended to use |, absolute value, in place of +, thus:
(**|)&.+. _4j_2 4e_16j2
_4j_2 0j2
round in the "numeric" package is [ * [: <. 0.5 + %~. You can use it as follows:
require 'numeric'
(0.01&round)&.+. _1.5j_4.6 4e_16j2 2j4e_16
_1.5j_4.6 0j2 2
The code given in the question, corrected to use | instead of + is (**|)&.+. and it is using a byproduct of operating on numbers to round them. You are taking each part of a complex number (&.+.) and multiplying (*) its absolute value (|) by its sign (*). You could achieve the same sort of effect by adding and subtracting a constant from your number with something like 10j10 -~ 10j10 + ].
[ * [: <. 0.5 + %~ doesn't use any byproducts, but rather directly rounds the number to the desired precision. %~ divides y by x, so that if you're rounding 0.3579 to two decimal places, indicated by an x of 0.01, your first step gets you 35.79. You then add 0.5 (0.5 +) and take the floor ([: <.), which is the same as rounding to zero places (35.79 + 0.5 = 36.29, the floor of which is 36). The final step is to multiply by x ([ *) to undo what was done with %~.
While it is tempting to create a complex version of round with [ * [: <. 0.5j0.5 + %~, using <. on a complex number produces the complex floor, which is probably not be what you are after. If you expect the imaginary and real components to be rounded independently, go with applying round under +.. I think the following gives you a taste of how the complex floor is different than taking the floor of each part of the complex number:
<. 0.7 0j0.7 0.6j0.7 0.7j0.6
0 0 0j1 1
This helps explain the following:
1 ([ * [: <. 0.5j0.5 + %~) 0.2 0j0.2 0.1j0.2 0.2j0.1
1 0j1 0j1 1
"Rounding" 0.2 to 1 caught me off-guard, but it is because 0.2 + 0.5i0.5 = 0.7i0.5, and <. 07j0.5 has a complex floor of 1. The same goes for 0j0.2 "rounding" to 0j1.
If you just want the nearest number where both parts of the complex number are integers, you can use ([: <. 0.5 + ])&.+.:
([: <. 0.5 + ])&.+. 0.2 0j0.2 0.1j0.2 0.2j0.1
0 0 0 0
([: <. 0.5 + ])&.+. 0.7 0j0.7 0.6j0.7 0.7j0.6
1 0j1 1j1 1j1
The one I've been using is (**#|) which does not have this problem.
Specifically in this case, this seems to do what you want:
(**#|)&.+.
Related
I am writing a program where I need to delete duplicate points stored in a matrix. The problem is that when it comes to check whether those points are in the matrix, MATLAB can't recognize them in the matrix although they exist.
In the following code, intersections function gets the intersection points:
[points(:,1), points(:,2)] = intersections(...
obj.modifiedVGVertices(1,:), obj.modifiedVGVertices(2,:), ...
[vertex1(1) vertex2(1)], [vertex1(2) vertex2(2)]);
The result:
>> points
points =
12.0000 15.0000
33.0000 24.0000
33.0000 24.0000
>> vertex1
vertex1 =
12
15
>> vertex2
vertex2 =
33
24
Two points (vertex1 and vertex2) should be eliminated from the result. It should be done by the below commands:
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
After doing that, we have this unexpected outcome:
>> points
points =
33.0000 24.0000
The outcome should be an empty matrix. As you can see, the first (or second?) pair of [33.0000 24.0000] has been eliminated, but not the second one.
Then I checked these two expressions:
>> points(1) ~= vertex2(1)
ans =
0
>> points(2) ~= vertex2(2)
ans =
1 % <-- It means 24.0000 is not equal to 24.0000?
What is the problem?
More surprisingly, I made a new script that has only these commands:
points = [12.0000 15.0000
33.0000 24.0000
33.0000 24.0000];
vertex1 = [12 ; 15];
vertex2 = [33 ; 24];
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
The result as expected:
>> points
points =
Empty matrix: 0-by-2
The problem you're having relates to how floating-point numbers are represented on a computer. A more detailed discussion of floating-point representations appears towards the end of my answer (The "Floating-point representation" section). The TL;DR version: because computers have finite amounts of memory, numbers can only be represented with finite precision. Thus, the accuracy of floating-point numbers is limited to a certain number of decimal places (about 16 significant digits for double-precision values, the default used in MATLAB).
Actual vs. displayed precision
Now to address the specific example in the question... while 24.0000 and 24.0000 are displayed in the same manner, it turns out that they actually differ by very small decimal amounts in this case. You don't see it because MATLAB only displays 4 significant digits by default, keeping the overall display neat and tidy. If you want to see the full precision, you should either issue the format long command or view a hexadecimal representation of the number:
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793
>> num2hex(pi)
ans =
400921fb54442d18
Initialized values vs. computed values
Since there are only a finite number of values that can be represented for a floating-point number, it's possible for a computation to result in a value that falls between two of these representations. In such a case, the result has to be rounded off to one of them. This introduces a small machine-precision error. This also means that initializing a value directly or by some computation can give slightly different results. For example, the value 0.1 doesn't have an exact floating-point representation (i.e. it gets slightly rounded off), and so you end up with counter-intuitive results like this due to the way round-off errors accumulate:
>> a=sum([0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]); % Sum 10 0.1s
>> b=1; % Initialize to 1
>> a == b
ans =
logical
0 % They are unequal!
>> num2hex(a) % Let's check their hex representation to confirm
ans =
3fefffffffffffff
>> num2hex(b)
ans =
3ff0000000000000
How to correctly handle floating-point comparisons
Since floating-point values can differ by very small amounts, any comparisons should be done by checking that the values are within some range (i.e. tolerance) of one another, as opposed to exactly equal to each other. For example:
a = 24;
b = 24.000001;
tolerance = 0.001;
if abs(a-b) < tolerance, disp('Equal!'); end
will display "Equal!".
You could then change your code to something like:
points = points((abs(points(:,1)-vertex1(1)) > tolerance) | ...
(abs(points(:,2)-vertex1(2)) > tolerance),:)
Floating-point representation
A good overview of floating-point numbers (and specifically the IEEE 754 standard for floating-point arithmetic) is What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
A binary floating-point number is actually represented by three integers: a sign bit s, a significand (or coefficient/fraction) b, and an exponent e. For double-precision floating-point format, each number is represented by 64 bits laid out in memory as follows:
The real value can then be found with the following formula:
This format allows for number representations in the range 10^-308 to 10^308. For MATLAB you can get these limits from realmin and realmax:
>> realmin
ans =
2.225073858507201e-308
>> realmax
ans =
1.797693134862316e+308
Since there are a finite number of bits used to represent a floating-point number, there are only so many finite numbers that can be represented within the above given range. Computations will often result in a value that doesn't exactly match one of these finite representations, so the values must be rounded off. These machine-precision errors make themselves evident in different ways, as discussed in the above examples.
In order to better understand these round-off errors it's useful to look at the relative floating-point accuracy provided by the function eps, which quantifies the distance from a given number to the next largest floating-point representation:
>> eps(1)
ans =
2.220446049250313e-16
>> eps(1000)
ans =
1.136868377216160e-13
Notice that the precision is relative to the size of a given number being represented; larger numbers will have larger distances between floating-point representations, and will thus have fewer digits of precision following the decimal point. This can be an important consideration with some calculations. Consider the following example:
>> format long % Display full precision
>> x = rand(1, 10); % Get 10 random values between 0 and 1
>> a = mean(x) % Take the mean
a =
0.587307428244141
>> b = mean(x+10000)-10000 % Take the mean at a different scale, then shift back
b =
0.587307428244458
Note that when we shift the values of x from the range [0 1] to the range [10000 10001], compute a mean, then subtract the mean offset for comparison, we get a value that differs for the last 3 significant digits. This illustrates how an offset or scaling of data can change the accuracy of calculations performed on it, which is something that has to be accounted for with certain problems.
Look at this article: The Perils of Floating Point. Though its examples are in FORTRAN it has sense for virtually any modern programming language, including MATLAB. Your problem (and solution for it) is described in "Safe Comparisons" section.
type
format long g
This command will show the FULL value of the number. It's likely to be something like 24.00000021321 != 24.00000123124
Try writing
0.1 + 0.1 + 0.1 == 0.3.
Warning: You might be surprised about the result!
Maybe the two numbers are really 24.0 and 24.000000001 but you're not seeing all the decimal places.
Check out the Matlab EPS function.
Matlab uses floating point math up to 16 digits of precision (only 5 are displayed).
I'm trying to understand RGB to YUV conversion equation.
and I've got some implementation from https://sistenix.com/rgb2ycbcr.html.
But I can't understand how it to be made as the below especially about R in (4)?
R<<6 + R<<1 ?
How does (65.7388*R)/256 can be represented as R<<6 + R<<1 ?
You are missing a part, (65.7388*R)/256 becomes (R<<6 + R<<1)>>8
The steps are actually pretty easy: approximating to the nearest integer (65.7388*R)/256 becomes (66*R)/256 that can be written as (64*R + 2*R)/256. A multiplication by 2 is equal to a shift to the left thus 2*R becomes R<<1 and 64*R becomes R<<6. In the same way, a division by 2 is equal to a shift to the right, thus /256 becomes >>8
From python documentation says that "the hexadecimal string 0x3.a7p10 represents the floating-point number (3 + 10./16 + 7./16**2) * 2.0**10, or 3740.0" so :
>>> float.fromhex('0x3.a7p10')
3740.0
then
>>> float.hex(3740.0)
'0x1.d380000000000p+11' (will give different presentation)
My question is how to convert '0x1.d380000000000p+11' in to floating number using calculation formula above and why classmethod float.hex and classmethod float.fromhex give different presentation.
Thankyou....
'0x1.d380000000000p+11' means (1 + 13./16 + 3./16**2 + 8/16**3) * 2.0**11, which is equal to 3740.0. To convert this result, you can run float.fromhex('0x1.d380000000000p+11') which returns 3740.0 again.
float.hex gives you a normalized representation, which means that the factor in front of the 2**x is between 1 and 2. What the interpreter did, was shift the comma in the binary representation by one position: increase the exponent (from 10 to 11), and half the factor (0x3.a7 / 2 = 0x1.d38).
In general, in this normalized representation, the factor in front is between 1 and the base. For example, if you do print(2234.2e-34), you get 2.2342e-31. Here the leading factor is between 1 and 10 because e corresponds to 10**x.
It's cool that 3 * 4 results in 12, and * 4 results in 1, but does using the same primitive for both operations ever provide a benefit? For example, let's say I were to define the following:
SIGNUM =: * : [:
TIMES =: [: : *
If I were to only ever use SIGNUM and TIMES instead of *, would I ever miss out on a clever use of *? That is, x TIMES y seems to be exactly the same as x * y for every x I can imagine (although my imagination is pretty limited in this regard). Is there an x where x * y produces the same result as SIGNUM y?
In case * : [: isn't immediately clear, the following should illustrate:
SIGNUM =: * : [:
TIMES =: [: : *
SIGNUM 4
1
3 TIMES 4
12
* 4
1
3 * 4
12
3 SIGNUM 4
|domain error: SIGNUM
| 3 SIGNUM 4
TIMES 4
|domain error: TIMES
| TIMES 4
Let's write conclusions from the comments down:
There is no direct language-level reason not to use names for primitives
Using names instead of primitives can however harm performance, as special code does not necessarily get triggered. I think this can be remedied by fixing verbs after building them with f..
The reason for having the same name for monadic and dyadic verbs is historical: APL used it before. Most verbs have a related actions in monadic / dyadic versions and inflections (a number of trailing dots and colons).
For instance, ^ can be expressed in traditional notation as pow(x,y) or exp(y) where x and y are left and right arguments, and e is Euler's constant. Here, the monadic version is the same as the dyadic version, with a sensible default left argument. Different inflections of the same root are all power-related verbs:
- ^. does logarithms (base e for the monad)
- ^: does Power conjunction, applying a verb a variable number of times.
Other relations between monadic and dyadic verbs can also exist, for example $ can be said to get or set the Shape of an array, depending on whether it is used as monad or dyad.
That said, I think that once one gets a bit of experience with J, it becomes easier to spot which valence a verb has based on the sentence it is used in. Examples are:
Monad # Ambiv NB. Mv is always used monadically, Av depends on arguments
Ambiv & Monad
(Dyad Monad) NB. A hook, where verb 1 is always dyadic
(Ambiv Dyad Ambiv) NB. A fork, the middle is one always dyadic
It was probably a mistake to use the same symbols for dyadic and monadic built-ins except for those where the monadic case is a default parameter to the dyad.
TIMES =: 1&$: : *
would be a good defnition that doesn't give an error.
As for ambivalent cases,
(3 * TIMES) 4
12
2 (3 * TIMES) 4
24
Another useful ambivalent verb is:
TIMESORSQUARE =: *~
*~ 3
9
2 *~ 3
6
In the ghci terminal, I was computing some equations with Haskell using the sqrt function.
I notice that I would sometimes lose precision in my sqrt result, when it was supposed to be simplified.
For example,
sqrt 4 * sqrt 4 = 4 -- This works well!
sqrt 2 * sqrt 2 = 2.0000000000000004 -- Not the exact result.
Normally, I would expect a result of 2.
Is there a way to get the right simplification result?
How does that work in Haskell?
There are usable precise number libraries in Haskell. Two that come to mind are cyclotomic and the CReal module in the numbers package. (Cyclotomic numbers don't support all the operations on complex numbers that you might like, but square roots of integers and rationals are in the domain.)
>>> import Data.Complex.Cyclotomic
>>> sqrtInteger 2
e(8) - e(8)^3
>>> toReal $ sqrtInteger 2
Just 1.414213562373095 -- Maybe Double
>>> sqrtInteger 2 * sqrtInteger 2
2
>>> toReal $ sqrtInteger 2 * sqrtInteger 2
Just 2.0
>>> rootsQuadEq 3 2 1
Just (-1/3 + 1/3*e(8) + 1/3*e(8)^3,-1/3 - 1/3*e(8) - 1/3*e(8)^3)
>>> let eq x = 3*x*x + 2*x + 1
>>> eq (-1/3 + 1/3*e(8) + 1/3*e(8)^3)
0
>>> import Data.Number.CReal
>>> sqrt 2 :: CReal
1.4142135623730950488016887242096980785697 -- Show instance cuts off at 40th place
>>> sqrt 2 * sqrt 2 :: CReal
2.0
>>> sin 3 :: CReal
0.1411200080598672221007448028081102798469
>>> sin 3*sin 3 + cos 3*cos 3 :: CReal
1.0
You do not lose precision. You have limited precision.
The square root of 2 is a real number but not a rational number, therefore it's value cannot be represented exactly by any computer (except representing it symbolically, of course).
Even if you define a very large precision type, it will not be able to represent the square root of 2 exactly. You may get more precision, but never enough to represent that value exactly (unless you have a computer with infinite memory, in which case please hire me).
The explanation for these results lies in the type of the values returned by the sqrt function:
> :t sqrt
sqrt :: Floating a => a -> a
The Floating a means that the value returned belongs to the Floating type class.
The values of all types belonging to this class are stored as floating point numbers. These sacrifice precision for the sake of covering a larger range of numbers.
Double precision floating point numbers can cover very large ranges but they have limited precision and cannot encode all possible numbers. The square root of 2 (√2) is one such number:
> sqrt 2
1.4142135623730951
> sqrt 2 + 0.000000000000000001
1.4142135623730951
As you see above, it is impossible for double precision floating point numbers to be precise enough to represent √2 + 0.000000000000000001, it is simply rounded to the closest approximation which can be expressed using floating point encoding.
As mentioned by another poster, √2 is an irrational number which can be simplified to mean that it requires an infinite number of digits to represent correctly. As such it cannot be represented faithfully using floating point numbers. This leads to errors such as the one you noticed when multiplying it with itself.
You can learn about floating points on their wikipedia page: http://en.wikipedia.org/wiki/Floating_point.
I especially recommend that you read the answer to this other Stack Overflow question: Floating Point Limitations and follow the mentioned link, it will help you understand what's going on under the hood.
Note that this is a problem in every language, not just Haskell. One way to get rid of it entirely is to use symbolic computation libraries but they are much slower than the floating point numbers offered by CPUs. For many computations the loss of precision due to floating points is not a problem.