Python code yielding different result for same numerical value, depending on inclusion of precision point

Python code yielding different result for same numerical value, depending on inclusion of precision point - python-3.x

I defined a function which returns a third order polynomial function for either a value, a list or a np.array:
def two_d_third_order(x, a, b, c, d):
return a + np.multiply(b, x) + np.multiply(c, np.multiply(x, x)) + np.multiply(d, np.multiply(x, np.multiply(x, x)))
The issue I noticed is, however, when I use "two_d_third_order" on the following two inputs:
1500
1500.0
With (a, b, c, d) = (1.20740028e+00, -2.93682465e-03, 2.29938078e-06, -5.09134552e-10), I get two different results:
2.4441
0.2574
, respectively. I don't know how this is possible, and any help would be appreciated.
I tried several inputs, and somehow the inclusion of a floating point on certain values (despite representing the same numerical value) changes the end result.

Python uses implicit data type conversions. When you use only integers (like 1500), there is a loss of precision in all subsequent operations. Whereas when you pass it a float or double (like 1500.0), subsequent operations are performed with the associated datatype, i.e in this case with higher precision.
This is not a "bug" so to speak, but generally how Python operates without the explicit declaration of data types. Languages like C and C++ require explicit data type declarations and explicit data type casting to ensure operations are performed in the prescribed precision formats. Can be a boon or a bane depending on usage.

1500 * 2_250_000 is 3_375_000_000 overflowing the range on int32:
print(type(np.multiply(1500, 2250000)))
print(np.multiply(1500, 2250000))
giving:
<class 'numpy.int32'>
-919967296
Where as floats use a much larger container.
print(type(np.multiply(1500.0, 2250000.0)))
print(np.multiply(1500.0, 2250000.0))
giving:
<class 'numpy.float64'>
3375000000.0
Try casting your input to a larger int.
(a, b, c, d) = (1.20740028e+00, -2.93682465e-03, 2.29938078e-06, -5.09134552e-10)
x1 = np.int64(1500)
x2 = 1500.0
print(two_d_third_order(x1, a, b, c, d) == two_d_third_order(x2, a, b, c, d))

Related

How to calculate modulus value for a set of values in python?

In one material I found one formula to calculate Precision as below
Here a and b are set of values. After many search in internet I found that modulus means remainder value or absolute value. Here I take modulus as absolute value and my python code for the above formula is as below
import numpy as np
def intersection(lst1, lst2):
return list(set(lst1) & set(lst2))
a = [7,21]
b = [11, 7, 27, 21]
a_intersect_b=intersection(a,b)
print(" a_intersect_b : ",a_intersect_b)
mod_a_intersect_b=[abs(x) for x in a_intersect_b]
print("|a_intersect_b| : ",mod_a_intersect_b)
mod_a=[abs(x) for x in a]
print("|a| : ",mod_a)
numerator=np.array(mod_a_intersect_b, dtype=np.float)
denominator=np.array(mod_a, dtype=np.float)
print(" mod_a_intersect_b / mod_a : ", numerator/denominator)
Here I get 2 output values. But in the material and in general the precision is a single value. If the list size increases then the output values also increases. Finally I found that I misunderstood the modulus meaning here. Guide me to get the single precision value as per the above formula. Thanks in advance.
Note: In the formula a and b are set of values. So I used list in my code. Also guide me if I use other option to mention set of values in python then I can get single precision value.

As #Hoog mentioned in gis comment, the modulus operation in the case of precision means a cardinality of some set (just a number of elements of the set), so you can define a precision as the following:
def precision(a, b):
"""
a: set, relevant items
b: set, retrieved items
returns: float, precision value
"""
return len(a & b) / len(a)
len(a) returns nuber of elements of the set, i.e. cardinality, |a| operation.
If a, b is lists, just wrap them in sets first:
def precision(a, b):
"""
a: set, relevant items
b: set, retrieved items
returns: float, precision value
"""
a, b = set(a), set(b)
return len(a & b) / len(a)
Also, in data science and related areas precision is a metric which calculates ratio 'true positives' / ('true positives' + 'false positives'). It's the same thing described in other terms - but standart implementations of precision won't help you.

math.sqrt function python gives same result for two different values [duplicate]

Why does the math module return the wrong result?
First test
A = 12345678917
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 12345678917
B = 12345678917
Here, the result is correct.
Second test
A = 123456758365483459347856
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 123456758365483459347856
B = 123456758365483467538432
Here the result is incorrect.
Why is that the case?

Because math.sqrt(..) first casts the number to a floating point and floating points have a limited mantissa: it can only represent part of the number correctly. So float(A**2) is not equal to A**2. Next it calculates the math.sqrt which is also approximately correct.
Most functions working with floating points will never be fully correct to their integer counterparts. Floating point calculations are almost inherently approximative.
If one calculates A**2 one gets:
>>> 12345678917**2
152415787921658292889L
Now if one converts it to a float(..), one gets:
>>> float(12345678917**2)
1.5241578792165828e+20
But if you now ask whether the two are equal:
>>> float(12345678917**2) == 12345678917**2
False
So information has been lost while converting it to a float.
You can read more about how floats work and why these are approximative in the Wikipedia article about IEEE-754, the formal definition on how floating points work.

The documentation for the math module states "It provides access to the mathematical functions defined by the C standard." It also states "Except when explicitly noted otherwise, all return values are floats."
Those together mean that the parameter to the square root function is a float value. In most systems that means a floating point value that fits into 8 bytes, which is called "double" in the C language. Your code converts your integer value into such a value before calculating the square root, then returns such a value.
However, the 8-byte floating point value can store at most 15 to 17 significant decimal digits. That is what you are getting in your results.
If you want better precision in your square roots, use a function that is guaranteed to give full precision for an integer argument. Just do a web search and you will find several. Those usually do a variation of the Newton-Raphson method to iterate and eventually end at the correct answer. Be aware that this is significantly slower that the math module's sqrt function.
Here is a routine that I modified from the internet. I can't cite the source right now. This version also works for non-integer arguments but just returns the integer part of the square root.
def isqrt(x):
"""Return the integer part of the square root of x, even for very
large values."""
if x < 0:
raise ValueError('square root not defined for negative numbers')
n = int(x)
if n == 0:
return 0
a, b = divmod(n.bit_length(), 2)
x = (1 << (a+b)) - 1
while True:
y = (x + n//x) // 2
if y >= x:
return x
x = y

If you want to calculate sqrt of really large numbers and you need exact results, you can use sympy:
import sympy
num = sympy.Integer(123456758365483459347856)
print(int(num) == int(sympy.sqrt(num**2)))

The way floating-point numbers are stored in memory makes calculations with them prone to slight errors that can nevertheless be significant when exact results are needed. As mentioned in one of the comments, the decimal library can help you here:
>>> A = Decimal(12345678917)
>>> A
Decimal('123456758365483459347856')
>>> B = A.sqrt()**2
>>> B
Decimal('123456758365483459347856.0000')
>>> A == B
True
>>> int(B)
123456758365483459347856
I use version 3.6, which has no hardcoded limit on the size of integers. I don't know if, in 2.7, casting B as an int would cause overflow, but decimal is incredibly useful regardless.

Robust linear interpolation

Given two segment endpoints A and B (in two dimensions), I would like to perform linear interpolation based on a value t, i.e.:
C = A + t(B-A)
In the ideal world, A, B and C should be collinear. However, we are operating with limited floating-point here, so there will be small deviations. To work around numerical issues with other operations I am using robust adaptive routines originally created by Jonathan Shewchuk. In particular, Shewchuk implements an orientation function orient2d that uses adaptive precision to exactly test the orientation of three points.
Here my question: is there a known procedure how the interpolation can be computed using the floating-point math, so that it lies exactly on the line between A and B? Here, I care less about the accuracy of the interpolation itself and more about the resulting collinearity. In another terms, its ok if C is shifted around a bit as long as collinearity is satisfied.

The bad news
The request can't be satisfied. There are values of A and B for which there is NO value of t other than 0 and 1 for which lerp(A, B, t) is a float.
A trivial example in single precision is x1 = 12345678.f and x2 = 12345679.f. Regardless of the values of y1 and y2, the required result must have an x component between 12345678.f and 12345679.f, and there's no single-precision float between these two.
The (sorta) good news
The exact interpolated value, however, can be represented as the sum of 5 floating-point values (vectors in the case of 2D): one for the formula's result, one for the error in each operation [1] and one for multiplying the error by t. I'm not sure if that will be useful to you. Here's a 1D C version of the algorithm in single precision that uses fused multiply-add to calculate the product error, for simplicity:
#include <math.h>
float exact_sum(float a, float b, float *err)
{
float sum = a + b;
float z = sum - a;
*err = a - (sum - z) + (b - z);
return sum;
}
float exact_mul(float a, float b, float *err)
{
float prod = a * b;
*err = fmaf(a, b, -prod);
return prod;
}
float exact_lerp(float A, float B, float t,
float *err1, float *err2, float *err3, float *err4)
{
float diff = exact_sum(B, -A, err1);
float prod = exact_mul(diff, t, err2);
*err1 = exact_mul(*err1, t, err4);
return exact_sum(A, prod, err3);
}
In order for this algorithm to work, operations need to conform to IEEE-754 semantics in round-to-nearest mode. That's not guaranteed by the C standard, but the GNU gcc compiler can be instructed to do so, at least in processors supporting SSE2 [2][3].
It is guaranteed that the arithmetic addition of (result + err1 + err2 + err3 + err4) will be equal to the desired result; however, there is no guarantee that the floating-point addition of these quantities will be exact.
To use the above example, exact_lerp(12345678.f, 12345679.f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns a result of 12345678.f and err1, err2, err3 and err4 are 0.0f, 0.0f, 0.300000011920928955078125f and 0.0f respectively. Indeed, the correct result is 12345678.300000011920928955078125 which can't be represented as a single-precision float.
A more convoluted example: exact_lerp(0.23456789553165435791015625f, 7.345678806304931640625f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns 2.3679010868072509765625f and the errors are 6.7055225372314453125e-08f, 8.4771045294473879039287567138671875e-08f, 1.490116119384765625e-08f and 2.66453525910037569701671600341796875e-15f. These numbers add up to the exact result, which is 2.36790125353468550173374751466326415538787841796875 and can't be exactly stored in a single-precision float.
All numbers in the examples above are written using their exact values, rather than a number that approximates to them. For example, 0.3 can't be represented exactly as a single-precision float; the closest one has an exact value of 0.300000011920928955078125 which is the one I've used.
It might be possible that if you calculate err1 + err2 + err3 + err4 + result (in that order), you get an approximation that is considered collinear in your use case. Perhaps worth a try.
References
[1] Graillat, Stef (2007). Accurate Floating Point Product and Exponentiation.
[2] Enabling strict floating point mode in GCC
[3] Semantics of Floating Point Math in GCC

Random Color Generator (Python 3)

How would I generate random (R,G,B) colors with minimum components of .5 in a tuple? I'm new at this and fairly confused.

Since this is a homework assignment, it's probably best to avoid just giving you the answer, so here's some basics. You can make a tuple with commas (parens only needed when commas might have other meanings, e.g. in function calls, literals of other types, etc., or when the lack of parens would lead to incorrect order of operations), so to make a tuple of three elements, you can just do:
threetup = a, b, c # Or (a, b, c)
where a, b, and c can be replaced with any source of a value.
For generating the random RGB components, I suggest you take a look at the random module, specifically random.uniform for getting random floating point values.

Somthing like this:
import random
def rand_color():
return "#" + "".join(random.sample("0123456789abcdef", 6))
You need to take care of the ".5 minimum" requirement.

Can good type systems distinguish between matrices in different bases?

My program (Hartree-Fock/iterative SCF) has two matrices F and F' which are really the same matrix expressed in two different bases. I just lost three hours of debugging time because I accidentally used F' instead of F. In C++, the type-checker doesn't catch this kind of error because both variables are Eigen::Matrix<double, 2, 2> objects.
I was wondering, for the Haskell/ML/etc. people, whether if you were writing this program you would have constructed a type system where F and F' had different types? What would that look like? I'm basically trying to get an idea how I can outsource some logic errors onto the type checker.
Edit: The basis of a matrix is like the unit. You can say 1L or however many gallons, they both mean the same thing. Or, to give a vector example, you can say (0,1) in Cartesian coordinates or (1,pi/2) in polar. But even though the meaning is the same, the numerical values are different.
Edit: Maybe units was the wrong analogy. I'm not looking for some kind of record type where I can specify that the first field will be litres and the second gallons, but rather a way to say that this matrix as a whole, is defined in terms of some other matrix (the basis), where the basis could be any matrix of the same dimensions. E.g., the constructor would look something like mkMatrix [[1, 2], [3, 4]] [[5, 6], [7, 8]] and then adding that object to another matrix would type-check only if both objects had the same matrix as their second parameters. Does that make sense?
Edit: definition on Wikipedia, worked examples

This is entirely possible in Haskell.
Statically checked dimensions
Haskell has arrays with statically checked dimensions, where the dimensions can be manipulated and checked statically, preventing indexing into the wrong dimension. Some examples:
This will only work on 2-D arrays:
multiplyMM :: Array DIM2 Double -> Array DIM2 Double -> Array DIM2 Double
An example from repa should give you a sense. Here, taking a diagonal requires a 2D array, returns a 1D array of the same type.
diagonal :: Array DIM2 e -> Array DIM1 e
or, from Matt sottile's repa tutorial, statically checked dimensions on a 3D matrix transform:
f :: Array DIM3 Double -> Array DIM2 Double
f u =
let slabX = (Z:.All:.All:.(0::Int))
slabY = (Z:.All:.All:.(1::Int))
u' = (slice u slabX) * (slice u slabX) +
(slice u slabY) * (slice u slabY)
in
R.map sqrt u'
Statically checked units
Another example from outside of matrix programming: statically checked units of dimension, making it a type error to confuse e.g. feet and meters, without doing the conversion.
Prelude> 3 *~ foot + 1 *~ metre
1.9144 m
or for a whole suite of SI units and quanities.
E.g. can't add things of different dimension, such as volumes and lengths:
> 1 *~ centi litre + 2 *~ inch
Error:
Expected type: Unit DVolume a1
Actual type: Unit DLength a0
So, following the repa-style array dimension types, I'd suggest adding a Base phantom type parameter to your array type, and using that to distinguish between bases. In Haskell, the index Dim
type argument gives the rank of the array (i.e. its shape), and you could do similarly.
Or, if by base you mean some dimension on the units, using dimensional types.
So, yep, this is almost a commodity technique in Haskell now, and there's some examples of designing with types like this to help you get started.

This is a very good question. I don't think you can encode the notion of a basis in most type systems, because essentially anything that the type checker does needs to be able to terminate, and making judgments about whether two real-valued vectors are equal is too difficult. You could have (2 v_1) + (2 v_2) or 2 (v_1 + v_2), for example. There are some languages which use dependent types [ wikipedia ], but these are relatively academic.
I think most of your debugging pain would be alleviated if you simply encoded the bases in which you matrix works along with the matrix. For example,
newtype Matrix = Matrix { transform :: [[Double]],
srcbasis :: [Double], dstbasis :: [Double] }
and then, when you M from basis a to b with N, check that N is from b to c, and return a matrix with basis a to c.
NOTE -- it seems most people here have programming instead of math background, so I'll provide short explanation here. Matrices are encodings of linear transformations between vector spaces. For example, if you're encoding a rotation by 45 degrees in R^2 (2-dimensional reals), then the standard way of encoding this in a matrix is saying that the standard basis vector e_1, written "[1, 0]", is sent to a combination of e_1 and e_2, namely [1/sqrt(2), 1/sqrt(2)]. The point is that you can encode the same rotation by saying where different vectors go, for example, you could say where you're sending [1,1] and [1,-1] instead of e_1=[1,0] and e_2=[0,1], and this would have a different matrix representation.
Edit 1
If you have a finite set of bases you are working with, you can do it...
{-# LANGUAGE EmptyDataDecls #-}
data BasisA
data BasisB
data BasisC
newtype Matrix a b = Matrix { coefficients :: [[Double]] }
multiply :: Matrix a b -> Matrix b c -> Matrix a c
multiply (Matrix a_coeff) (Matrix b_coeff) = (Matrix multiplied) :: Matrix a c
where multiplied = undefined -- your algorithm here
Then, in ghci (the interactive Haskell interpreter),
*Matrix> let m = Matrix [[1, 2], [3, 4]] :: Matrix BasisA BasisB
*Matrix> m `multiply` m
<interactive>:1:13:
Couldn't match expected type `BasisB'
against inferred type `BasisA'
*Matrix> let m2 = Matrix [[1, 2], [3, 4]] :: Matrix BasisB BasisC
*Matrix> m `multiply` m2
-- works after you finish defining show and the multiplication algorithm

While I realize this does not strictly address the (clarified) question – my apologies – it seems relevant at least in relation to Don Stewart's popular answer...
I am the author of the Haskell dimensional library that Don referenced and provided examples from. I have also been writing – somewhat under the radar – an experimental rudimentary linear algebra library based on dimensional. This linear algebra library statically tracks the sizes of vectors and matrices as well as the physical dimensions ("units") of their elements on a per element basis.
This last point – tracking physical dimensions on a per element basis – is rather challenging and perhaps overkill for most uses, and one could even argue that it makes little mathematical sense to have quantities of different physical dimensions as elements in any given vector/matrix. However, some linear algebra applications of interest to me such as kalman filtering and weighted least squares estimation typically use heterogeneous state vectors and covariance matrices.
Using a Kalman filter as an example, consider a state vector x = [d, v] which has physical dimensions [L, LT^-1]. The next (future) state vector is predicted by multiplication by the state transition matrix F, i.e.: x' = F x_. Clearly for this equation to make sense F cannot be arbitrary but must have size and physical dimensions [[1, T], [T^-1, 1]]. The predict_x' function below statically ensures that this relationship holds:
predict_x' :: (Num a, MatrixVector f x x) => Mat f a -> Vec x a -> Vec x a
predict_x' f x_ = f |*< x_
(The unsightly operator |*< denotes multiplication of a matrix on the left with a vector on the right.)
More generally, for an a priori state vector x_ of arbitrary size and with elements of arbitrary physical dimensions, passing a state transition matrix f with "incompatible" size and/or physical dimensions to predict_x' will cause a compile time error.

In F# (which originally evolved from OCaml), you can use units of measure. Andrew Kenned, who designed the feature (and also created a very interesting theory behind it) has a great series of articles that demonstrate it.
This can quite likely be used in your scenario - although I don't fully understand the question. For example, you can declare two unit types like this:
[<Measure>] type litre
[<Measure>] type gallon
Adding litres and gallons gives you a compile time error:
1.0<litre> + 1.0<gallon> // Error!
F# doesn't automatically insert conversion between different units, but you can write a conversion function:
let toLitres gal = gal * 3.78541178<litre/gallon>
1.0<litre> + (toLitres 1.0<gallon>)
The beautiful thing about units of measure in F# is that they are automatically inferred and functions are generic. If you multiply 1.0<gallon> * 1.0<gallon>, the result is 1.0<gallon^2>.
People have used this feature for various things - ranging from conversion of virtual meters to screen pixels (in solar system simulations) to converting currencies (dollars in financial systems). Although I'm not expert, it is quite likely that you could use it in some way for your problem domain too.

If it's expressed in a different base, you can just add a template parameter to act as the base. That will differentiate those types. A float is a float is a float- if you don't want two float values to be the same if they actually have the same value, then you need to tell the type system about it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string