Spurious exponents in sympy expressions - rounding

I build sp expressions (import sympy as sp) by multiplying / dividing others.
Sometimes I find results like (expr1)
1.0*K1**1.11e-16*K2**1.11e-16/K3**1.11e-16*K4**0.8
while the correct expression is (expr2)
1.0*K4**0.8
My first question is what are possible ways of converting expr1 into expr2?
I can set an arbitrary threshold for the exponents of, say, 1e-10 to consider them spurious.
I can concoct a recursive checking of the expression, but perhaps there is something more concise/pythonic.
For the time being I can assume all powers are only at the first mul/div level, there are no additions/subtractions involved.
My second question is what are possible reasons for these spurious powers to appear?
There are no sp.N involved, e.g.
The way I build my expressions is quite intricate to post here, so I welcome possible causes for this.
I guess this is well within reach of knowledgeable fellows.
In the meantime, I will try conceiving reasonable and representative MCVEs.
Related (some, marginally):
Evaluating coefficients but not exponents in sympy
How to isolate the exponent in Sympy (how to `match` large formulas in Sympy)
Sympy - Find all expressions of a certain level

As noted in the comments this happens because you are using floats rather than exact rational numbers:
In [16]: from sympy import *
In [17]: x = symbols('x')
In [18]: x**0.11/x**0.1/x**0.01
Out[18]:
-5.20417042793042e-18
x
The problem here is that in (binary) floating point 0.11 for example is not exactly equal to 0.1 + 0.01 because in fact none of these numbers floats is equal to the intended number that cannot be represented exactly.
The best approach in general is that in cases where you did not mean for a number to be approximate you should use exact rational numbers. There are different ways of creating exact numbers rational numbers:
In [19]: x**Rational('0.11')/x**Rational('0.1')/x**Rational('0.01')
Out[19]: 1
In [20]: x**Rational(11,100)/x**Rational(10,100)/x**Rational(1,100)
Out[20]: 1
In [22]: x**(S(11)/100)/x**(S(10)/100)/x**(S(1)/100)
Out[22]: 1
Another approach also suggested in the comments is to use nsimplify:
In [23]: x**0.11
Out[23]:
0.11
x
In [24]: nsimplify(x**0.11)
Out[24]:
11
───
100
x
In [25]: x**Rational(0.11)
Out[25]:
7926335344172073
─────────────────
72057594037927936
x
Observe that the exact value of the float 0.11 is not in fact equal to the mathematical number 0.11. What nsimplify does is to try to guess what number you really intended so it in fact performs an inexact conversion from float to Rational. This is useful as a convenience but cannot be expected to be always reliable so it is better to use rational numbers in the first place and have exact calculations throughout.
Another reason you might have these floats is because of calling evalf:
In [35]: e = sqrt(x)*pi
In [36]: e
Out[36]: π⋅√x
In [37]: e.evalf()
Out[37]:
0.5
3.14159265358979⋅x
In [38]: nfloat(e)
Out[38]: 3.14159265358979⋅√x
Here the nfloat function can be used to avoid calling evalf on the exponent. This is useful because having floats in exponents of symbols is particularly problematic.
Another approach is using functions for making roots e.g.:
In [39]: x**0.5
Out[39]:
0.5
x
In [40]: x**(1/3)
Out[40]:
0.333333333333333
x
In [41]: sqrt(x)
Out[41]: √x
In [42]: cbrt(x)
Out[42]:
3 ___In [39]: x**0.5
Out[39]:
0.5
x
In [40]: x**(1/3)
Out[40]:
0.333333333333333
x
In [41]: sqrt(x)
Out[41]: √x
In [42]: cbrt(x)
Out[42]:
3 ___
╲╱ x
In [43]: root(x, 4)
Out[43]:
4 ___
╲╱ x
╲╱ x
In [43]: root(x, 4)
Out[43]:
4 ___
╲╱ x
Finally if you have an expression like the one that you showed it is not possible to use nsimplify at that stage because it will not guess that small numbers are supposed to be zero. Instead you can manually replace small floats though:
In [49]: K1, K2, K3, K4 = symbols('K1:5')
In [50]: e = 1.0*K1**1.11e-16*K2**1.11e-16/K3**1.11e-16*K4**0.8
In [51]: e
Out[51]:
1.11e-16 1.11e-16 -1.11e-16 0.8
1.0⋅K₁ ⋅K₂ ⋅K₃ ⋅K₄
In [52]: nsimplify(e)
Out[52]:
111 111
─────────────────── ───────────────────
1000000000000000000 1000000000000000000 4/5
K₁ ⋅K₂ ⋅K₄
─────────────────────────────────────────────────
111
───────────────────
1000000000000000000
K₃
In [53]: e.replace(lambda t: isinstance(t, Float), lambda f: f if abs(f) > 1e-10 else 0)
Out[53]:
0.8
1.0⋅K₄

Related

Finding the mean of a distribution

My code generates a number of distributions (I only plotted one below to make it more legible). Y axis - here represents a probability density function and the X axis - is a simple array of values.
In more detail.
Y = [0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]
And X is generated using np.arange(0,10,1) = [0 1 2 3 4 5 6 7 8 9]
I want to find the mean of this distribution (i.e where the curve peaks on the X-axis, not the Y value mean. I know how to use numpy packages np.mean to find the mean of Y but its not what I need.
By eye, the mean here is about x=3 but I would like to generate this with a code to make it more accurate.
Any help would be great.
By definition, the mean (actually, the expected value of a random variable x, but since you have the PDF, you could use the expected value) is sum(p(x[j]) * x[j]), where p(x[j]) is the value of the PDF at x[j]. You can implement this as code like this:
>>> import numpy as np
>>> Y = np.array(eval(",".join("[0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]".split())))
>>> Y
array([0.02046505, 0.10756612, 0.24319883, 0.30336375, 0.22071875,
0.0890625 , 0.015625 , 0. , 0. , 0. ])
>>> X = np.arange(0, 10)
>>> Y.sum()
1.0
>>> (X * Y).sum()
2.92599253
So the (approximate) answer is 2.92599253.

sklearn's precision_recall_curve incorrect on small example

Here is a very small example using precision_recall_curve():
from sklearn.metrics import precision_recall_curve, precision_score, recall_score
y_true = [0, 1]
y_predict_proba = [0.25,0.75]
precision, recall, thresholds = precision_recall_curve(y_true, y_predict_proba)
precision, recall
which results in:
(array([1., 1.]), array([1., 0.]))
The above does not match the "manual" calculation which follows.
There are three possible class vectors depending on threshold: [0,0] (when the threshold is > 0.75) , [0,1] (when the threshold is between 0.25 and 0.75), and [1,1] (when the threshold is <0.25). We have to discard [0,0] because it gives an undefined precision (divide by zero). So, applying precision_score() and recall_score() to the other two:
y_predict_class=[0,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)
which gives:
(1.0, 1.0)
and
y_predict_class=[1,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)
which gives
(0.5, 1.0)
This seems not to match the output of precision_recall_curve() (which for example did not produce a 0.5 precision value).
Am I missing something?
I know I am late, but I had your same doubt that I have eventually solved.
The main point here is that precision_recall_curve() does not output precision and recall values anymore after full recall is obtained the first time; moreover, it concatenates a 0 to the recall array and a 1 to the precision array so as to let the curve start in correspondence of the y-axis.
In your specific example, you'll have effectively two arrays done like this (they are ordered the other way around because of the specific implementation of sklearn):
precision, recall
(array([1., 0.5]), array([1., 1.]))
Then, the values of the two arrays which do correspond to the second occurrence of full recall are omitted and 1 and 0 values (for precision and recall, respectively) are concatenated as described above:
precision, recall
(array([1., 1.]), array([1., 0.]))
I have tried to explain it here in full details; another useful link is certainly this one.

Numpy finding the number of points within a specific distance in absolute value

I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)

Compute sum of pairwise sums of two array's columns

I am looking for a way to avoid the nested loops in the following snippet, where A and B are two-dimensional arrays, each of shape (m, n) with m, n beeing arbitray positive integers:
import numpy as np
m, n = 5, 2
a = randint(0, 10, (m, n))
b = randint(0, 10, (m, n))
out = np.empty((n, n))
for i in range(n):
for j in range(n):
out[i, j] = np.sum(A[:, i] + B[:, j])
The above logic is roughly equivalent to
np.einsum('ij,ik', A, B)
with the exception that einsum computes the sum of products.
Is there a way, equivalent to einsum, that computes a sum of sums? Or do I have to write an extension for this operation?
einsum needs to perform elementwise multiplication and then it does summing (optional). As such it might not be applicable/needed to solve our case. Read on!
Approach #1
We can leverage broadcasting such that the first axes are aligned
and second axis are elementwise summed after extending dimensions to 3D. Finally, we need summing along the first axis -
(A[:,:,None] + B[:,None,:]).sum(0)
Approach #2
We can simply do outer addition of columnar summations of each -
A.sum(0)[:,None] + B.sum(0)
Approach #3
And hence, bring in einsum -
np.einsum('ij->j',A)[:,None] + np.einsum('ij->j',B)
You can also use numpy.ufunc.outer, specifically here numpy.add.outer after summing along axis 0 as #Divakar mentioned in #approach 2
In [126]: numpy.add.outer(a.sum(0), b.sum(0))
Out[126]:
array([[54, 67],
[43, 56]])

Convert list of numpy.float64 to float in Python quickly

What is the fastest way of converting a list of elements of type numpy.float64 to type float? I am currently using the straightforward for loop iteration in conjunction with float().
I came across this post: Converting numpy dtypes to native python types, however my question isn't one of how to convert types in python but rather more specifically how to best convert an entire list of one type to another in the quickest manner possible in python (i.e. in this specific case numpy.float64 to float). I was hoping for some secret python machinery that I hadn't come across that could do it all at once :)
The tolist() method should do what you want. If you have a numpy array, just call tolist():
In [17]: a
Out[17]:
array([ 0. , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
0.71428571, 0.85714286, 1. , 1.14285714, 1.28571429,
1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])
In [18]: a.dtype
Out[18]: dtype('float64')
In [19]: b = a.tolist()
In [20]: b
Out[20]:
[0.0,
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857142,
0.8571428571428571,
1.0,
1.1428571428571428,
1.2857142857142856,
1.4285714285714284,
1.5714285714285714,
1.7142857142857142,
1.857142857142857,
2.0]
In [21]: type(b)
Out[21]: list
In [22]: type(b[0])
Out[22]: float
If, in fact, you really have python list of numpy.float64 objects, then #Alexander's answer is great, or you could convert the list to an array and then use the tolist() method. E.g.
In [46]: c
Out[46]:
[0.0,
0.33333333333333331,
0.66666666666666663,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
In [47]: type(c)
Out[47]: list
In [48]: type(c[0])
Out[48]: numpy.float64
#Alexander's suggestion, a list comprehension:
In [49]: [float(v) for v in c]
Out[49]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
Or, convert to an array and then use the tolist() method.
In [50]: np.array(c).tolist()
Out[50]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
If you are concerned with the speed, here's a comparison. The input, x, is a python list of numpy.float64 objects:
In [8]: type(x)
Out[8]: list
In [9]: len(x)
Out[9]: 1000
In [10]: type(x[0])
Out[10]: numpy.float64
Timing for the list comprehension:
In [11]: %timeit list1 = [float(v) for v in x]
10000 loops, best of 3: 109 µs per loop
Timing for conversion to numpy array and then tolist():
In [12]: %timeit list2 = np.array(x).tolist()
10000 loops, best of 3: 70.5 µs per loop
So it is faster to convert the list to an array and then call tolist().
You could use a list comprehension:
floats = [float(np_float) for np_float in np_float_list]
So out of the possible solutions I've come across (big thanks to Warren Weckesser and Alexander for pointing out all of the best possible approaches) I ran my current method and that presented by Alexander to give a simple comparison for runtimes (the two choices come as a result of the fact that I have a true list of elements of numpy.float64 and wish to convert them to float speedily):
2 approaches covered: list comprehension and basic for loop iteration
First here's the code:
import datetime
import numpy
list1 = []
for i in range(0,1000):
list1.append(numpy.float64(i))
list2 = []
t_init = time.time()
for num in list1:
list2.append(float(num))
t_1 = time.time()
list2 = [float(np_float) for np_float in list1]
t_2 = time.time()
print("t1 run time: {}".format(t_1-t_init))
print("t2 run time: {}".format(t_2-t_1))
I ran four times to give a quick set of results:
>>> run 1
t1 run time: 0.000179290771484375
t2 run time: 0.0001533031463623047
Python 3.4.0
>>> run 2
t1 run time: 0.00018739700317382812
t2 run time: 0.0001518726348876953
Python 3.4.0
>>> run 3
t1 run time: 0.00017976760864257812
t2 run time: 0.0001513957977294922
Python 3.4.0
>>> run 4
t1 run time: 0.0002455711364746094
t2 run time: 0.00015997886657714844
Python 3.4.0
Clearly to convert a true list of numpy.float64 to float, the optimal approach is to use python's list comprehension.

Resources