BicGStab yields unexpected breakdown flag - python-3.x

I need to solve a cascade of sparse linear systems Ax=b. The solution x of the first systems is an input to the second system, which is an input to the third and so on. Because of numerical errors compounding and for other reasons, I have to use scipy.sparse.linalg.bicgstab as my linear solver. However, for a system that is not even ill-conditioned and definitely has an inverse, the solver gives me a flag for: "illegal input or breakdown".
import numpy as np
from scipy.sparse.linalg import bicgstab, inv
from scipy import sparse
A = np.array(
[[ -1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., -1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., -10., 0., 0., 0., 0., 0.],
[ 0., 0., 0., -10., 0., 0., 0., 0.],
[ 0., 0., 3., 0., -3., 0., 0., 0.],
[ 0., 0., 0., 3., 0., -3., 0., 0.],
[ 0., 0., 0., 7., 3., 0., -10., 0.],
[ 0., 0., 7., 0., 0., 3., 0., -10.]]
)
A = -sparse.csc_matrix(A)
b = np.array([ 1., 0., 10., 0., 0., 0., 0., 0.])
x, flag = bicgstab(A=A, b=b, maxiter=40, tol=1e-6)
x, flag
>>> (array([1. , 0. , 1. , 0. , 1.00118012,
0. , 0.3004875 , 0.70009946]), -10)
Just to prove the point
inv(A).dot(b)
>>> array([1. , 0. , 1. , 0. , 1. , 0. , 0.3, 0.7])
The output above is exactly what I expect. Does anyone know why bicgstab is not giving me the desired output? I could not find documentation on illegal input or breakdown for bicgstab, and therefore I am my question on SO.

The -10 error code does not necessarily mean that you have a wrong input; in your case, it is most likely that the breakdown occurred during the iterative solve.
By slightly changing your RHS:
b = np.array([ 1., 0., 0., 0., 10., 0., 0., 0.])
the scipy.bicgstab has no troubles converging even without a preconditioner:
x, flag = bicgstab(A=A, b=b, maxiter=40, tol=1e-6)
print (x, flag)
(array([1. , 0. , 0. , 0. , 3.33333333,
0. , 1. , 0. ]), 0)
The fact that the matrix has an inverse and a decent condition number
print(np.linalg.cond(A))
14.616397823169317
does not guarantee that it is easy to obtain a solution for a particular RHS, especially using an iterative solver or a particular iterative solver. It seems to me (without an elaborate analysis of the matrix spectrum and its kernel space), that your RHS lies exactly in such a "bad region".
If you are simply interested in a solution, I would suggest switching to GMRES:
x, flag = gmres(A=A, b=b, maxiter=40, tol=1e-6)
(array([1. , 0. , 0.1 , 0. , 0.1 , 0. , 0.03, 0.07]), 0)
If you are interested in investigating why BiCGStab failed, while GMRES succeded in the solution of this system, I would invite your narrowed down question to Computational Science SE.

Related

pytorch loss function for regression model with a vector of values

I'm training a CNN architecture to solve a regression problem using PyTorch where my output is a tensor of 25 values. The input/target tensor could be either all zeros or a gaussian distribution with a sigma value of 2. An example of a 4-sample batch is as this one:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
My question is how to design a loss function for the model effectively learn the regression output with 25 values.
I have tried 2 types of loss, torch.nn.MSELoss() and torch.nn.MSELoss()-torch.nn.CosineSimilarity(). They sort of work. However, sometimes the network has difficulty converging, especially when there are a lot of samples with all "zeros", which leads the network to output a vector with all 25 small values.
My question is, is there any other loss which we could try?
Your values do not seem widely different in scale so an MSELoss seems like it would work fine. Your model could be collapsing because of the many zeros in your target.
You can always try torch.nn.L1Loss() (but I do not expect it to be much better than torch.nn.MSELoss())
I suggest that you instead try to predict the gaussian mean/mu, and later try to re-create the gaussian for each sample if you really need it.
So you have two alternatives if you choose to try this method.
Alt 1
A good alternative is to encode your target to look like a classification target. Your 25 element vectors become a single value where the original target == 1 (possible classes will 0, 1, 2, ..., 24). We can then assign a sample that contains "only zeroes" as our last class "25". So your target:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
becomes
[4,
10,
20,
25]
If you do this, then you can try the common torch.nn.CrossEntropyLoss().
I do not know what your dataloader looks like but given a single sample in your original format, you can convert it to my proposed format with:
def encode(tensor):
if tensor.sum() == 0:
return len(tensor)
return torch.argmax(tensor)
and back to a gaussian with:
def decode(value):
n_values = 25
zero = torch.zeros(n_values)
if value == n_values:
return zero
# Create gaussian around value
std = 2
n = torch.arange(n_values) - value
sig = 2*std**2
gauss = torch.exp(-n**2 / sig2)
# Only return 9 values from the gaussian
start_ix = max(value-6, 0)
end_ix = min(value+7,n_values)
zero[start_ix:end_ix] = gauss[start_ix:end_ix]
return zero
(Note I have not tried them with batches, only samples)
Alt 2
The second option is to change your regression targets (still only the argmax positions (mu)) to a nicer regression value in the range 0-1 and have a separate neuron that outputs a "mask value" (also 0-1). Then your batch of:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
becomes
# [Mask, mu]
[
[1, 0.1666], # True, 4/24
[1, 0.4166], # True, 10/24
[1, 0.8333], # True, 20/24
[0, 0] # False, undefined
]
If you are using this setup, then you should be able to use an MSELoss with modification:
def custom_loss(input, target):
# Assume target and input is of shape [Batch, 2]
mask = target[...,1]
mask_loss = torch.nn.functional.mse_loss(input[...,0], target[...,0])
mu_loss = torch.nn.functional.mse_loss(mask*input[...,1], mask*target[...,1])
return (mask_loss + mu_loss) / 2
This loss would only look at the 2nd value (mu) if the mask of the target is 1. Otherwise it only tried to optimize for the correct mask.
To encode to this format you would use:
def encode(tensor):
n_values = 25
if tensor.sum() == 0:
return torch.tensor([0,0])
return torch.argmax(tensor) / (n_values-1)
and to decode:
def decode(tensor):
n_values = 25
# Parse values
mask, value = tensor
mask = torch.round(mask)
value = torch.round((n_values-1)*value)
zero = torch.zeros(n_values)
if mask == 0:
return zero
# Create gaussian around value
std = 2
n = torch.arange(n_values) - value
sig = 2*std**2
gauss = torch.exp(-n**2 / sig2)
# Only return 9 values from the gaussian
start_ix = max(value-6, 0)
end_ix = min(value+7,n_values)
zero[start_ix:end_ix] = gauss[start_ix:end_ix]
return zero

How to interprete ACF and PACF functions from statsmodels?

I'm trying to determine p and q values for an ARMA model. The time series is already stationary and I was looking to ACF and PACF plots, but I need to get those p and q values "on the go" (like performing a simulation).
I noticed that in statsmodels there are actually two functions for acf and pacf, but I'm not understanding how to use them properly.
This is how the code looks like
from statsmodels.tsa.stattools import acf, pacf
>>>acf(data,qstat=True)
(array([1. , 0.98707179, 0.9809318 , 0.9774078 , 0.97436479,
0.97102392, 0.96852746, 0.96620799, 0.9642253 , 0.96288455,
0.96128443, 0.96026672, 0.95912503, 0.95806287, 0.95739194,
0.95622575, 0.9545498 , 0.95381055, 0.95318588, 0.95203675,
0.95096276, 0.94996035, 0.94892427, 0.94740811, 0.94582933,
0.94420572, 0.9420396 , 0.9408416 , 0.93969163, 0.93789606,
0.93608273, 0.93413445, 0.93343312, 0.93233588, 0.93093149,
0.93033546, 0.92983324, 0.92910616, 0.92830326, 0.92799811,
0.92642784]),
array([ 2916.11296684, 5797.02377904, 8658.22999328, 11502.6002944 ,
14328.44503612, 17140.72034976, 19940.48013538, 22729.69637912,
25512.09429552, 28286.18290207, 31055.33003897, 33818.82409725,
36577.1270353 , 39332.49361223, 42082.0755955 , 44822.94911057,
47560.49941212, 50295.38504714, 53024.59880222, 55748.57526173,
58467.72758802, 61181.8659989 , 63888.25003765, 66586.53110019,
69276.46332225, 71954.97102175, 74627.57217707, 77294.54406888,
79952.23080669, 82600.54514273, 85238.73829645, 87873.86209917,
90503.68343426, 93126.47509834, 95746.79574474, 98365.17422285,
100980.34471949, 103591.88164688, 106202.58634768, 108805.3453693 ]),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]))
>>>pacf(data)
array([ 1. , 0.98740203, 0.26463067, 0.18709112, 0.11351714,
0.0540612 , 0.06996315, 0.05159168, 0.05358487, 0.06867607,
0.03915513, 0.06099868, 0.04020074, 0.0390229 , 0.05198753,
0.01873783, -0.00169158, 0.04387457, 0.03770717, 0.01360295,
0.01740693, 0.01566421, 0.01409722, -0.00988412, -0.00860644,
-0.00905181, -0.0344616 , 0.0199406 , 0.01123293, -0.02002155,
-0.01415968, -0.0266674 , 0.03583483, 0.0065682 , -0.00483241,
0.0342638 , 0.02353691, 0.01704061, 0.01292073, 0.03163407,
-0.02838961])
How can I get p and q with this functions? The acf function returns only 1 array if qstat is set to False
Selecting the order of an ARMA(p,q) model using estimated ACFs/PACFs is usually not the best approach. This is simply because in case of an ARMA process both the ACF and PACF slowly decay (in absolute terms) for increasing lags. So you cannot really infer the lag order from it. Instead they are mostly used for pure AR/MA models in which you observe a clear cutoff in either of the two series (but even then it is more of a graphical approach).
If you want to determine p and q "on the fly" for an ARMA model it seems more reasonable to use information criteria (e.g. AIC, BIC, etc.). statsmodels provides the function arma_order_select_ic() for this very purpose. So what you want is something like this:
from statsmodels.tsa.stattools import arma_order_select_ic
arma_order_select_ic(data, max_ar=4, max_ma=4, ic='bic')

CVXPY 1.0.24 in Python 3.+ isn't solving quad problem correctly

I am trying to maximize cx - xAx with A being positive definite, but solution is not what I it should be. Please help
I tried the problem using this data
A = np.array([[1595., 1098., 1133., 0., 0., 0., 0.],
[1191., 1497., 1133., 0., 0., 0., 0.],
[1191., 1098., 1396., 0., 0., 0., 0.],
[ 0., 0., 0., 655., 0., 0., 0.],
[ 0., 0., 0., 0., 1313., 0., 0.],
[ 0., 0., 0., 0., 0., 581., 0.],
[ 0., 0., 0., 0., 0., 0., 536.]])
c = np.array([4673.36981266, 4727.12719741, 5939.49046907, 3867.69830799,
6099.15146109, 5358.10885615, 4885.96523884])
prob = cp.Problem(cp.Maximize(cp.quad_form(x,A)+c.T#x),[x>=0])
prob.solve()
I get DCP error with code above..
I then tried Minimize version but then get -inf as answer
prob = cp.Problem(cp.Minimize(cp.quad_form(x,A)-c.T#x),[x>=0])
prob.solve()
The actual optimal solution to Max (cx - xAx) is
np.array([0,0,2.134,2.903,2.359,4.6266,4.508])
with optimal value of 42586
I think you forgot to change your constraint after you changed the problem from a maximization to a minimization. For a minimization, you could try x<=0

How to keep using values from a list until the diagonal of a matrix is full using itertools

So I am trying to use a smaller list to populate the diagonal of a larger matrix. I thought using the cycle function in itertools would make this an easy task but I can't seem to get it to work. Here is what I tried
a = np.zeros((10,10))
b = [1, 2, 3, 4, 5]
for i in range(len(a.shape[0])):
a[i, i] = list(itertools.cycle(b))
but this makes it endlessly iterate. I am hoping that it will stop once the diagonal has been filled. Other options that are more pythonic are greatly appreciated!
you mean to use itertools.cycle, not repeat. The latter repeats the element (the list), good luck setting that into a value, specially if you force iteration (since it runs forever)
I'd create a reference on a cycle object outside the loop and assign a value to the diagonal iterating over it manually (the only proper way with cycle). Also note that your loop range was wrong. a.shape[0] is a dimension, no need for len
import numpy as np,itertools
a = np.zeros((10,10))
b = [1, 2, 3, 4, 5]
iterator = itertools.cycle(b)
for i in range(a.shape[0]):
a[i, i] = next(iterator)
result:
>>> a
array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 3., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 4., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 5., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 3., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 4., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 5.]])
As they loop forever, cycle and repeat should not be used in a context of forced iteration (repeat has an optional parameter to limit the repeats, though).

How to slice matrix with logic sign?

I can apply the following code to an array.
from numpy import *
A = eye(4)
A[A[:,1] > 0.5,:]
But How can I apply the similar method to a mat?
A = mat(eye(4))
A[A[:,1] > 0.5,:]
I know the above code is wrong, but what should I do?
The problem is that, when A is a numpy.matrix, A[:,1] returns a 2-d matrix, and therefore A[:,1] > 0.5 is also 2-d. Anything that makes this expression look like the same thing that is created when A is an ndarray will work. For example, you can write A.A[:,1] > 0.5 (the .A attribute returns an ndarray view of the matrix), or (A[:,1] > 0.5).A1 (the A1 attribute returns a flatten ndarray).
For example,
In [119]: A
Out[119]:
matrix([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
In [120]: A[(A[:, 1] > 0.5).A1,:]
Out[120]: matrix([[ 0., 1., 0., 0.]])
In [121]: A[A.A[:, 1] > 0.5,:]
Out[121]: matrix([[ 0., 1., 0., 0.]])
Because of quirks like these, I (and many others) recommend avoiding the numpy.matrix class. Most code can be written just as easily by using ndarrays throughout.

Resources