Creating an ndarray from ragged nested sequences(which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths / shapes)is deprecated - nested

I need to create a nested array before creating a constraint in the CPLEX library.
Here is the code.
A = np.array([1,2,3,[1,3]])
A.astype(dtype = object)
p = np.array([40,100,70])
c = np.array([20,100,120])
v = 3
w = 5
for i, idx in enumerate(A):
# get z and c
z_lst = z[np.array(idx)-1]
c_lst = c[np.array(idx)-1]
# compute R
mdl.add_constraint(1 + np.sum(z_lst) >= y * ((n + 1 - 2 * (len(A))) * (2 * v + 2 * w) +
np.sum((2 * w + v + c_lst)))) #need to check
Error notification:
C:/Users/pknu/Documents/Research/project/Final Code/Marking Optimization Single Arm - 3 steps.py:10: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
A = np.array([1,2,3,[1,3]])
Could anyone suggest me what to do please? Thank you in advance!

Related

Optimizing asymmetrically reweighted penalized least squares smoothing (from matlab to python)

I'm trying to apply the method for baselinining vibrational spectra, which is announced as an improvement over asymmetric and iterative re-weighted least-squares algorithms in the 2015 paper (doi:10.1039/c4an01061b), where the following matlab code was provided:
function z = baseline(y, lambda, ratio)
% Estimate baseline with arPLS in Matlab
N = length(y);
D = diff(speye(N), 2);
H = lambda*D'*D;
w = ones(N, 1);
while true
W = spdiags(w, 0, N, N);
% Cholesky decomposition
C = chol(W + H);
z = C \ (C' \ (w.*y) );
d = y - z;
% make d-, and get w^t with m and s
dn = d(d<0);
m = mean(d);
s = std(d);
wt = 1./ (1 + exp( 2* (d-(2*s-m))/s ) );
% check exit condition and backup
if norm(w-wt)/norm(w) < ratio, break; end
end
that I rewrote into python:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
while True:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
return(z)
Except for the input vector y the method requires parameters lam and ratio and it runs ok for values lam<1.e+07 and ratio>1.e-01, but outputs poor results. When values are changed outside this range, for example lam=1e+07, ratio=1e-02 the CPU starts heating up and job never finishes (I interrupted it after 1min). Also in both cases the following warning shows up:
/usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/dsolve/linsolve.py: 144: SparseEfficencyWarning: spsolve requires A to be CSC or CSR matrix format warn('spsolve requires A to be CSC or CSR format',
although I added the recommended format='csr' option to the spdiags call.
And here's some synthetic data (similar to one in the paper) for testing purposes. The noise was added along with a 3rd degree polynomial baseline The method works well for parameters bl_1 and fails to converge for bl_2:
import numpy
from matplotlib import pyplot
from scipy.sparse import spdiags, diags, identity
from scipy.sparse.linalg import spsolve
from numpy.linalg import cholesky, norm
import sys
x = numpy.arange(0, 1000)
noise = numpy.random.uniform(low=0, high = 10, size=len(x))
poly_3rd_degree = numpy.poly1d([1.2e-06, -1.23e-03, .36, -4.e-04])
poly_baseline = poly_3rd_degree(x)
y = 100 * numpy.exp(-((x-300)/15)**2)+\
200 * numpy.exp(-((x-750)/30)**2)+ \
100 * numpy.exp(-((x-800)/15)**2) + noise + poly_baseline
bl_1 = baseline_arPLS(y, 1e+07, 1e-01)
bl_2 = baseline_arPLS(y, 1e+07, 1e-02)
pyplot.figure(1)
pyplot.plot(x, y, 'C0')
pyplot.plot(x, poly_baseline, 'C1')
pyplot.plot(x, bl_1, 'k')
pyplot.show()
sys.exit(0)
All this is telling me that I'm doing something very non-optimal in my python implementation. Since I'm not knowledgeable enough about the intricacies of scipy computations I'm kindly asking for suggestions on how to achieve convergence in this calculations.
(I encountered an issue in running the "straight" matlab version of the code because the line D = diff(speye(N), 2); truncates the last two rows of the matrix, creating dimension mismatch later in the function. Following the description of matrix D's appearance I substituted this line by directly creating a tridiagonal matrix using the diags function.)
Guided by the comment #hpaulj made, and suspecting that the loop exit wasn't coded properly, I re-visited the paper and found out that the authors actually implemented an exit condition that was not featured in their matlab script. Changing the while loop condition provides an exit for any set of parameters; my understanding is that algorithm is not guaranteed to converge in all cases, which is why this condition is necessary but was omitted by error. Here's the edited version of my python code:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
i = 0
N_iterations = 100
while i < N_iterations:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
i += 1
return(z)

Python Truth value of a series is ambiguous error in Function

I 'm trying to build a function that uses several scalar values as inputs and one series or array also as an input.
The function applies calculations to each value in the series. It works fine so far. But now I'm adding a phase where it has to check the value of the series and if it's less than X it performs one calculation other it performs a different calculation.
However I keep getting a 'truth value series is ambiguous error and I can't seem to solve it.
What is a work around?
My code is below
import numpy as np
import pandas as pd
import math
tramp = 2
Qo = 750
Qi = 1500
b = 1.2
Dei = 0.8
Df = 0.08
Qf = 1
tmax = 30
tper = 'm'
t = pd.Series(range(1,11))
def QHyp_Mod(Qi, b, Dei, Df, Qf, tmax, tper, t):
tper = 12
Qi = Qi * (365/12)
Qf = Qf * (365/12)
ai = (1 / b) * ((1 / (1 - Dei)) ** b - 1)
aim = ai / tper
ai_exp = -np.log(1 - Df)
aim_exp = ai_exp / tper
t_exp_sw = 118
Qi_exp = Qi / ((1 + aim * t_exp_sw * b) ** (1 / b))
Qcum = (Qi / (aim * (1 - b))) * (1 - (1 / ((1 + aim * t * b) ** ((1 - b) / b))))
t_exp = t - t_exp_sw
Qcum_Exp = (Qi_exp / aim_exp) * (1 - np.exp(-aim_exp * t_exp))
if t < t_exp_sw:
return Qcum
else:
return Qcum_exp
z = QHyp_Mod(Qi=Qi, b=b, Dei=Dei, Df=Df, Qf=Qf, tmax=tmax, tper=tper, t=t)
Replace the if - else statement:
if t < t_exp_sw:
return Qcum
else:
return Qcum_exp
with this:
Q.where(t < t_exp_sw, Q_exp)
return Q
The where method tests the conditional for each member of Q, if true keeps the original value, and if false replaces it with the corresponding element of Q_exp

Simpson's rule 3/8 for n intervals in Python

im trying to write a program that gives the integral approximation of e(x^2) between 0 and 1 based on this integral formula:
Formula
i've done this code so far but it keeps giving the wrong answer (Other methods gives 1.46 as an answer, this one gives 1.006).
I think that maybe there is a problem with the two for cycles that does the Riemman sum, or that there is a problem in the way i've wrote the formula. I also tried to re-write the formula in other ways but i had no success
Any kind of help is appreciated.
import math
import numpy as np
def f(x):
y = np.exp(x**2)
return y
a = float(input("¿Cual es el limite inferior? \n"))
b = float(input("¿Cual es el limite superior? \n"))
n = int(input("¿Cual es el numero de intervalos? "))
x = np.zeros([n+1])
y = np.zeros([n])
z = np.zeros([n])
h = (b-a)/n
print (h)
x[0] = a
x[n] = b
suma1 = 0
suma2 = 0
for i in np.arange(1,n):
x[i] = x[i-1] + h
suma1 = suma1 + f(x[i])
alfa = (x[i]-x[i-1])/3
for i in np.arange(0,n):
y[i] = (x[i-1]+ alfa)
suma2 = suma2 + f(y[i])
z[i] = y[i] + alfa
int3 = ((b-a)/(8*n)) * (f(x[0])+f(x[n]) + (3*(suma2+f(z[i]))) + (2*(suma1)))
print (int3)
I'm not a math major but I remember helping a friend with this rule for something about waterplane area for ships.
Here's an implementation based on Wikipedia's description of the Simpson's 3/8 rule:
# The input parameters
a, b, n = 0, 1, 10
# Divide the interval into 3*n sub-intervals
# and hence 3*n+1 endpoints
x = np.linspace(a,b,3*n+1)
y = f(x)
# The weight for each points
w = [1,3,3,1]
result = 0
for i in range(0, 3*n, 3):
# Calculate the area, 4 points at a time
result += (x[i+3] - x[i]) / 8 * (y[i:i+4] * w).sum()
# result = 1.4626525814387632
You can do it using numpy.vectorize (Based on this wikipedia post):
a, b, n = 0, 1, 10**6
h = (b-a) / n
x = np.linspace(0,n,n+1)*h + a
fv = np.vectorize(f)
(
3*h/8 * (
f(x[0]) +
3 * fv(x[np.mod(np.arange(len(x)), 3) != 0]).sum() + #skip every 3rd index
2 * fv(x[::3]).sum() + #get every 3rd index
f(x[-1])
)
)
#Output: 1.462654874404461
If you use numpy's built-in functions (which I think is always possible), performance will improve considerably:
a, b, n = 0, 1, 10**6
x = np.exp(np.square(np.linspace(0,n,n+1)*h + a))
(
3*h/8 * (
x[0] +
3 * x[np.mod(np.arange(len(x)), 3) != 0].sum()+
2 * x[::3].sum() +
x[-1]
)
)
#Output: 1.462654874404461

How to fix Range function in Python?

I am trying to convert MATLAB code into Python and am facing errors related to the range function of Python.
The entire code can be found here and I am working on Range Imaging code.
MATLAB code
Ts=(2*(Xc-X0))/c;
Tf=(2*(Xc+X0))/c+Tp;
n=2*ceil((.5*(Tf-Ts))/dt);
t=Ts+(0:n-1)*dt;
dw=pi2/(n*dt);
w=wc+dw*(-n/2:n/2-1);
x=Xc+.5*c*dt*(-n/2:n/2-1);
kx=(2*w)/c;
value of dt is 2.500000000000000e-09, n is 4268, Ts is 1.300000000000000e-05
Python
Ts = (2 * (Xc - X0)) / c
Tf = (2 * (Xc - X0)) / c + Tp
n = 2 * math.ceil((.5 * (Tf - Ts)) / dt)
t = list(Ts + (np.array(range(0, n-1)) * dt)) # tried using the solution in the comments
dw = pi2 / (n * dt)
w = list(wc + dw * (np.array(range(-n/2,n/2-1)))) # getting error here after trying same kind of solution
x = Xc + .5 * c * dt * range(-n/2,n/2-1)
kx=(2 * w) / c
The Python code throws the following error:
TypeError: 'float' object cannot be interpreted as an integer
Since you are coming from Matlab, you most likely want to use numpy for vector/matrix calculations. Lists in python cannot be multiplied like arrays in Matlab, but numpy arrays can. range will result in a range object, which you can convert to a numpy array, or you can directly use numpy.arange:
import numpy as np
import math
Ts = (2 * (Xc - X0)) / c
Tf = (2 * (Xc - X0)) / c + Tp
n = 2 * math.ceil((.5 * (Tf - Ts)) / dt)
t = Ts + np.arange(0, n*dt, dt) # np.arange(start, stop, step)
dw = pi2 / (n * dt)
w = wc + dw * np.arange(-n/2, n/2) # not n/2-1 since stop is not included
x = Xc + 0.5 * c * dt * np.arange(-n/2, n/2)
kx = (2 * w) / c
A difference between Matlab and Numpy in this case is that Matlab will include the last value (i.e. interval [start, stop]) where numpy does not (i.e. interval [start,stop)). Meaning that you will have to use n*dt for the stop input argument.
The range function in Python returns a range object which is itself just a list. Lists cannot be multiplied with a decimal number, which is what you're trying to do: range(0,(n-1)) * dt.
But you could convert the range list to a numpy array:
t = list(Ts + (numpy.array(range(0, n-1)) * dt))

Numpy tensor implementation slower than loop

I have two functions that compute the same metric. One ends up using a list comprehension to cycle through a calculation, the other uses only numpy tensor operations. The functions take in a (N, 3) array, where N is the number of points in 3D space. When N <~ 3000 the tensor function is faster, when N >~ 3000 the list comprehension is faster. Both seem to have linear time complexity in terms of N i.e two time-N lines cross at N=~3000.
def approximate_area_loop(section, num_area_divisions):
n_a_d = num_area_divisions
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = lambda u: (1 - u) * a1 + u * a2
d = lambda u: (1 - u) * b1 + u * b2
x = lambda u, v: (1 - v) * c(u) + v * d(u)
area = np.sum([np.linalg.norm(np.cross((x((i + 1)/n_a_d, j/n_a_d) - x(i/n_a_d, j/n_a_d)),\
(x(i/n_a_d, (j +1)/n_a_d) - x(i/n_a_d, j/n_a_d))), axis = 1)\
for i in range(n_a_d) for j in range(n_a_d)])
Dt = section[-1, 0] - section[0, 0]
return area, Dt
def approximate_area_tensor(section, num_area_divisions):
divisors = np.linspace(0, 1, num_area_divisions + 1)
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = np.multiply.outer(a1, (1 - divisors)) + np.multiply.outer(a2, divisors) # c_areas_vecs_divs
d = np.multiply.outer(b1, (1 - divisors)) + np.multiply.outer(b2, divisors) # d_areas_vecs_divs
x = np.multiply.outer(c, (1 - divisors)) + np.multiply.outer(d, divisors) # x_areas_vecs_Divs_divs
u = x[:, :, 1:, :-1] - x[:, :, :-1, :-1] # u_areas_vecs_Divs_divs
v = x[:, :, :-1, 1:] - x[:, :, :-1, :-1] # v_areas_vecs_Divs_divs
sub_area_norm_vecs = np.cross(u, v, axis = 1) # areas_crosses_Divs_divs
sub_areas = np.linalg.norm(sub_area_norm_vecs, axis = 1) # areas_Divs_divs (values are now sub areas)
area = np.sum(sub_areas)
Dt = section[-1, 0] - section[0, 0]
return area, Dt
Why does the list comprehension version work faster at large N? Surely the tensor version should be faster? I'm wondering if it's something to do with the size of the calculations meaning it's too big to be done in cache? Please ask if I haven't included enough information, I'd really like to get to the bottom of this.
The bottleneck in the fully vectorized function was indeed in the np.linalg.norm as #hpauljs comment suggested.
Norm was used only to get the magnitude of all the vectors contained in axis 1. A much simpler and faster method was to just:
sub_areas = np.sqrt((sub_area_norm_vecs*sub_area_norm_vecs).sum(axis = 1))
This gives exactly the same results and sped up the code by up to 25 times faster than the loop implementation (even when the loop doesn't use linalg.norm either).

Resources