Decimal module is not working with Numpy or Scipy - python-3.x

I want to use Decimal module.
getcontext().prec = 3
d1 = Decimal("0.1")
a = float(0.20052)
b = str(a)
d2 = Decimal(b)
q = d1+d2
print(q) ###0.301
and
getcontext().prec = 1
d1 = Decimal("0.1")
a = float(0.20052)
b = str(a)
d2 = Decimal(b)
q = d1+d2
print(q)##0.3
is working.
However, the codes below is not working. I want "0.0."
np.random.seed(12345678) #fix random seed to get the same result
n1 = 200 # size of first sample
n2 = 300 # size of second sample
rvs1 = stats.norm.rvs(size=n1, loc=0., scale=1)
rvs2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5)
print(stats.mannwhitneyu(rvs1, rvs2))###MannwhitneyuResult(statistic=25639.0, pvalue=0.0029339910470636116)
p_value = stats.mannwhitneyu(rvs1, rvs2).pvalue
print(p_value)###0.0029339910470636116
p_str = str(p_value)
getcontext().prec = 1
p_n = Decimal(p_str)
print(p_n)###0.0029339910470636116
I saw this question and used item method, but the result has not changed. I want "0.0029."
getcontext().prec = 4
p2 = Decimal(p_value.item())
print(p2)####0.0029339910470636116311682339841127031832002103328704833984375
MacOS 10.14.5; python 3.7.2; jupyter notebook 4.4.0; numpy 1.17.2; scipy 1.2.1
In addition, I want "0.0029" but the results are shown below.
getcontext().prec = 4
p_n = Decimal(p_str)
print(p_n)##0.0029339910470636116
p_n = Decimal(p_str) + 0
print(p_n)##0.002934
p_n = Context(prec=4).create_decimal(p_str)+0
print(p_n)##0.002934

... the result has not changed.
There is a conceptual gap, here.
Changing prec of the current context
changes how e.g. __add__( ... ) behaves.
It does not change how the constructor behaves -- if you supply a
high precision input the ctor will still offer a high precision output.
Consider this demo:
>>> getcontext().prec = 1
>>>
>>> Decimal('.12345')
Decimal('0.12345')
>>>
>>> Decimal('.12345') + 0
Decimal('0.1')
And naturally, the prec attribute has no effect at all on
unrelated math packages that use IEEE-754 FP operations, such as numpy.
If p_value has many digits of precision,
then it is unsurprising that Decimal(p_value)
will report many digits of precision.
Perhaps you'd like to add 0 to that?

Related

4D chaotic system Lyapunov exponent

I am trying to work on the 4 dimensional chaotic attractor Lyapunov spectrum and there values so far the code mention below works well for three dimensional system but errors arise in 4D and 5D system
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import odeint
def diff_Lorenz(u):
x,y,z,w= u
f = [a*(y-x) , x*z+w, b-x*y, z*y-c*w]
Df = [[-a,a,0,0], [z,0, x,1], [-y, -x, 0,0],[0,z,y,-c]]
return np.array(f), np.array(Df)
def LEC_system(u):
#x,y,z = u[:3]
U = u[2:18].reshape([4,4])
L = u[12:15]
f,Df = diff_Lorenz(u[:4])
A = U.T.dot(Df.dot(U))
dL = np.diag(A).copy();
for i in range(4):
A[i,i] = 0
for j in range(i+1,4): A[i,j] = -A[j,i]
dU = U.dot(A)
return np.concatenate([f,dU.flatten(),dL])
a=6;b=11;c=5;
u0 = np.ones(4)
U0 = np.identity(4)
L0 = np.zeros(4)
u0 = np.concatenate([u0, U0.flatten(), L0])
t = np.linspace(0,10,301)
u = odeint(lambda u,t:LEC_system(u),u0,t, hmax=0.05)
L = u[5:,12:15].T/t[5:]
# plt.plot(t[5:],L.T)
# plt.show()
p1=L[0,:];p2=L[1,:];p3=L[2,:];p4=L[3,:]
L1 = np.mean(L[0,:]);L2=np.average(L[1,:]);L3=np.average(L[2,:]);L4=np.average(L[3,:])
t1 = np.linspace(0,100,len(p1))
plt.plot(t1,p1);plt.plot(t1,p2);plt.plot(t1,p3);plt.plot(t1,p4)
# plt.show()
print('LES= ',L1,L2,L3,L4)
the output error is
D:\anaconda3\lib\site-packages\scipy\integrate\odepack.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_7008/1971199288.py in <module>
32 # plt.plot(t[5:],L.T)
33 # plt.show()
---> 34 p1=L[0,:];p2=L[1,:];p3=L[2,:];p4=L[3,:]
35 L1=np.mean(L[0,:]);L2=np.average(L[1,:]);L3=np.average(L[2,:]);L4=np.average(L[3,:])
36 t1 = np.linspace(0,100,len(p1))
IndexError: index 3 is out of bounds for axis 0 with size 3
what is wrong?
output expected is
L1=.5162,L2=-.0001,L3=-4.9208,L4=-6.5954
In LEC_system(u), the flat vector u contains in sequence
the state vector, n components,
the eigenbasis U, a n x n matrix
the accumulated exponents L, n components.
With n=4, this translates thus to the decomposition
def LEC_system(u):
#x,y,z,w = u[:4]
U = u[4:20].reshape([4,4])
L = u[20:24]
f,Df = diff_Lorenz(u[:4])
A = U.T.dot(Df.dot(U))
dL = np.diag(A).copy();
for i in range(4):
A[i,i] = 0
for j in range(i+1,4): A[i,j] = -A[j,i]
dU = U.dot(A)
return np.concatenate([f,dU.flatten(),dL])
Of course, in the evaluation after the integration one has to likewise use the correct segment of the state vector
L = u[5:,20:24].T/t[5:]
Then I get the plot
and only using the latter quart of the graphs, after integrating to t=60
LES= 0.029214865425355396 -0.43816854013111833 -4.309199339754925 -6.28183676249535
This still are not the expected values, as that appears to be contracting along all directions transversal to the solution curve.

py2 vs py3 addition output difference in float format

a = 310.97
b = 233.33
sum= 0.0
for i in [a,b]:
sum += i
print(sum)
py2 o/p: 544.3
py3 o/p: 544.3000000000001
Any way to report py3 output as same as py2 with futurizing? without using round-off ?
You could convert the values to integers before performing the operation and afterwards divide by a constant e.g. 100.0 in this case.
a = 310.97
b = 233.33
c = int(a * 100)
d = int(b * 100)
sum = 0
for i in [c,d]:
sum += i
result = sum / 100.0
print(result) # 544.3
The reason for the difference is the precision in the conversion from float to string in the two versions of Python.
a = 310.97
b = 233.33
sum = 0.0
for i in [a,b]:
sum += i
print("{:.12g}".format(x)) # 544.3
See this answer for further details: Python format default rounding when formatting float number

Chudnovsky algorithm python incorrect decimals

my goal is to get 100.000 or 200.000 correct decimals of Pi in Python. For this, I have tried using the Chudnovsky algorithm, but I've got some issues along the way.
First, the program only gives me 29 chars, instead of the 50 I want to test the correctness. I know this is a small issue, but I don't understand what I've done wrong.
Second, only the first 14 decimals are correct. After those, I start getting inaccurate Pi decimals according to about all Pi numbers on the internet. How do I get way more correct decimals?
And last, how do I let my code run on all 4 of the threads I have? I've tried using Pool, but it doesn't seem to work. (Checked it with Windows task manager)
This is my code:
from math import *
from decimal import Decimal, localcontext
from multiprocessing import Pool
import time
k = 0
s = 0
c = Decimal(426880*sqrt(10005))
if __name__ == '__main__':
start = time.time()
pi = 0
with localcontext() as ctx:
ctx.prec = 50
with Pool(None) as pool:
for k in range(0,500):
m = Decimal((factorial(6 * k)) / (factorial(3 * k) * Decimal((factorial(k) ** 3))))
l = Decimal((545140134 * k) + 13591409)
x = Decimal((-262537412640768000) ** k)
subPi = Decimal(((m*l)/x))
s = s + subPi
print(c*(s**-1))
print(time.time() - start)
In addition to the small details discussed in the comments and proposed by #mark-dickinson I think I've fixed the multithreading but I haven't had a chance to test it, let me know if it works properly
UPDATE: Problems after the 28th digits were due to the assignment of sq and c before the decimal context change. Reassign their value after changing the context precision solved the problem
from math import *
import decimal
from decimal import Decimal, localcontext
from multiprocessing import Pool
import time
k = 0
s = 0
sq = Decimal(10005).sqrt() #useless here
c = Decimal(426880*sq) #useless here
def calculate():
global s, k
for k in range(0,500):
m = Decimal((factorial(6 * k)) / (factorial(3 * k) * Decimal((factorial(k) ** 3))))
l = Decimal((545140134 * k) + 13591409)
x = Decimal((-262537412640768000) ** k)
subPi = Decimal((m*l)/x)
s = s + subPi
print(c*(s**-1))
if __name__ == '__main__':
start = time.time()
pi = 0
decimal.getcontext().prec = 100 #change the precision to increse the result digits
sq = Decimal(10005).sqrt()
c = Decimal(426880*sq)
pool = Pool()
result = pool.apply_async(calculate)
result.get()
print(time.time() - start)

Does SciPy.sparse.linalg.svds give matrix rank?

I have a largish sparse binary-valued rectangular matrix, M, where n > m. My understanding of matrix rank suggests the largest possible rank is m, and my understanding of SVD suggests the rank of a matrix can be found by identifying the number of non-zero singular values.
I'm attempting to use SciPy.sparse.linalg.svds to determine the rank of M. First problem is that I cannot compute m singular values since k can only go up to p = m - 1. So I thought I'd be clever and compute p highest values, the p lowest values, combine them, run set to find the unique values, and end up with a list of at most m values. This didn't work out according to plan.
Here's a MWE:
import scipy.sparse
import scipy.sparse.linalg
import numpy
import itertools
m = 6
n = 10
test = scipy.sparse.rand(m, n, density=0.25, format='lil', dtype=None, random_state=None)
for i, j in itertools.product(list(range(m)), list(range(n))):
test[i, j] = 1 if test[i, j] > 0 else 0
U1, S1, VT1 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'LM', v0 = None, maxiter = None,
return_singular_vectors = True)
U2, S2, VT2 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'SM', v0 = None, maxiter = None,
return_singular_vectors = True)
S = list(set(numpy.concatenate((S1, S2), axis = 0)))
len(S)
Here's a sample output:
10
with S being
[0.5303120147925737,
1.0725314055439354,
2.7940865631779643,
1.5060744813473148,
1.8412737686034186,
0.3208993522030293,
0.5303120147925728,
1.072531405543936,
1.5060744813473153,
1.841273768603419]
How can a m X n matrix with m < n have a rank of n? Are my assumptions above incorrect, or am I misapplying the function? My real M is sparse, binary-valued, and roughly 300 X 500.
Thanks for looking!
With help from #tch I've come up with the following hack. To check for rank = m, I only need check the smallest value, and append it to the m - 1 values obtained from the svds highest values function. It turns out svds doesn't report 0s when thresholded, so the lowest values function will return nan for rank < m. Here's the revised code:
import scipy.sparse
import scipy.sparse.linalg
import numpy
import itertools
m = 6
n = 10
test = scipy.sparse.rand(m, n, density=0.25, format='lil', dtype=None, random_state=None)
test = test > 0
test = test.astype('d')
U1, S1, VT1 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'LM', v0 = None, maxiter = None,
return_singular_vectors = True)
U2, S2, VT2 = scipy.sparse.linalg.svds(test, k = 1, ncv = None, tol = 1e-5, which = 'SM', v0 = None, maxiter = None,
return_singular_vectors = True)
S = list(set(numpy.concatenate((S1, S2), axis = 0)))
print(sum(x > 1e-10 for x in S))
S
What you are trying to do would work in exact arithmetic (assuming the matrix has no repeat singular values). However, due to numerical rounding errors, it won't work in practice.
To see this try
C = np.random.randn(10,3)
u,s,vt = np.linalg.svd(C#C.T)
Note that C#C.T is a 10x10 matrix with rank 3. However, you will see that none of the singular values are exactly zero (however 7 are close to 0).
When finding the rank of a matrix numerically, thresholding is often used to determine what it means for a singular value to be 0. For instance, everything below 1e-10 may be set to zero.
If the matrix has exact rank k, hopefully you will see k singular values away from 0, and then min(m,n)-k singular values very close to zero. However, depending on the matrix, there may not even be a well defined "drop".
So for your example, you could try removing elements which are within some threshold of one another. However this of course could run into issues if the matrix has repeat singular values.
You could just run the smallest singular values and see how many give you near zero. Presumably the matrix is at least rank ` so the first singular value will be nonzero.
As a note about finding where test[i,j] > 0, you can just to test>0 and it will give a boolean array with True in the nonzero entries and False elsewhere. You can also set the dtype of the random matrix to bool and it will be True whenever the random number is nonzero.

Looping through multiple dataframes does not calculate properly

I am attempting to perform calculations, then loop through the same pandas dataframe and perform the same calculation but with an altered variable (one that increases each time it loops). If the loop range is set to just 1, all rows calculate properly and the new dataframe is created. However, attempting to actually loop the program results in NaN values everywhere except the first row.
Omega loop
for i in range(10):
#Determine first and last Julian dates of data
t1 = df.ix[:0,'jd']
t2 = df.ix[n-1:,'jd']
t2 = t2.reset_index(drop=True)
tj = t2-t1
#Iterate over each observation within each star file
jd = df['jd']
dmag = df['dmag']
sinw = np.sin(2*omega*jd)
sum1 = sinw.sum()
cosw = np.cos(2*omega*jd)
sum2 = cosw.sum()
#Calculate tau
tau = ((np.arctan(sum1/sum2))/(2*omega))
avgdmag = dmag.sum()/n
#Calculate sample variance
tot = (df['dmag']-avgdmag)**2
tot2 = tot.sum()
var = tot2/(n-1)
#Calculate sums for power series
sum3 = sum3 + ((dmag - avgdmag)*np.cos(omega*(jd-tau)))
sum4 = sum4 + (np.cos(omega*(jd-tau)))**2
sum5 = sum5 + ((dmag - avgdmag)*np.sin(omega*(jd-tau)))
sum6 = sum6 + (np.sin(omega*(jd-tau)))**2
#Calculate power series and normalized power series
px = (((sum3**2)/sum4)+((sum5**2)/sum6))/2
pn = px/var
#Step through sequential frequencies
omega = omega + (1/tj)
I also received a runtime warning from NumPy caused by the omega term at the end. I disabled "invalid" warnings as it was not causing an issue with the actual calculations. The first dataframe that incorrectly computes is sinw and cosw. And all subsequently calculated dataframes have NaN values.
It is because your tj is a pd.Series of length 1, not scalar as you would expect. After the first loop, omega = omega + 1/tj becomes a Series of length 1 (with 0 as index). Then in the 2nd loop, tau = ((np.arctan(sum1/sum2))/(2*omega)) also becomes such a Series. When updating sum3, jd - tau (a Series of length n minus a Series of length 1) gives you a Series with all NaN except at index 0 where both series match. After that all subsequent Series has lots of NaNs.
The solution is to calculate tj as a scalar, such as
tj = df.loc[n-1,'jd'] - df.loc[0,'jd'] (assuming n = len(df)).
Anyway, your piece of code can be re-written for readability.
tj = df.loc[n-1,'jd'] - df.loc[0,'jd'] #tj is loop invariant
for _ in range(10):
sum1 = np.sin(2*omega*df['jd']).sum()
sum2 = np.cos(2*omega*df['jd']).sum()
tau = np.arctan(sum1/sum2)/(2*omega)
avgdmag = df['dmag'].mean()
var = df['dmag'].var() #unbiased sample variance
sum3 += ((df['dmag'] - avgdmag)*np.cos(omega*(df['jd']-tau)))
sum4 += (np.cos(omega*(df['jd']-tau)))**2
sum5 += ((df['dmag'] - avgdmag)*np.sin(omega*(df['jd']-tau)))
sum6 += (np.sin(omega*(df['jd']-tau)))**2
px = (((sum3**2)/sum4)+((sum5**2)/sum6))/2
pn = px/var
omega += 1/tj

Resources