I am pretty new to GPR. I will appreciate it if you provide me some suggestion regarding the following questions:
Can we use the Matern52 kernel in a sparse Gaussian process?
What is the best way to select pseudo inputs (Z) ? Is random sampling reasonable?
I would like to mention that when I am using the Matern52 kernel, the following error stops optimization process. My code:
k1 = gpflow.kernels.Matern52(input_dim=X_train.shape[1], ARD=True)
m = gpflow.models.SGPR(X_train, Y_train, kern=k1, Z=X_train[:50, :].copy())
InvalidArgumentError (see above for traceback): Input matrix is not invertible.
[[Node: gradients_25/SGPR-31ceaea6-412/Cholesky_grad/MatrixTriangularSolve = MatrixTriangularSolve[T=DT_DOUBLE, adjoint=false, lower=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](SGPR-31ceaea6-412/Cholesky, SGPR-31ceaea6-412/eye_1/MatrixDiag)]
Any help will be appreciated, thank you.
Have you tried it out on a small test set of data, that you could perhaps post here? There is no reason Matern52 shouldn't work. Randomly sampling inducing points should be a reasonable initialisation, especially in higher dimensions. However, you may run into issues if you end up with some inducing points very close to each other (this can make the K_{zz} = cov(f(Z), f(Z)) matrix badly conditioned, which would explain why the Cholesky fails). If your X_train isn't already shuffled, you may want to use Z=X_train[np.random.permutation(len(X_train))[:50] to get shuffled indices. It may also help to add a white noise kernel, kern=k1+gpflow.kernels.White() ...
Related
I want to solve a system of nonlinear equations using scipy.root. For performance reason, I want to provide the jacobian of the system using a LinearOperator. However, I cannot get it to work. Here is a minimal example using the gradient of the Rosenbrock function, where I first define the Jacobian (i.e. the Hessian of the Rosenbrock function) as a LinearOperator.
import numpy as np
import scipy.optimize as opt
import scipy.sparse as sp
ndim = 10
def rosen_hess_LO(x):
return sp.linalg.LinearOperator((ndim,ndim) ,matvec = (lambda dx,xl=x : opt.rosen_hess_prod(xl,dx)))
opt_result = opt.root(fun=opt.rosen_der,x0=np.zeros((ndim),float),jac=rosen_hess_LO)
Upon execution, I get the following error :
TypeError: fsolve: there is a mismatch between the input and output shape of the 'fprime' argument 'rosen_hess_LO'.Shape should be (10, 10) but it is (1,).
What am I missing here ?
Partial answer :
I was able to input my "exact" jacobian into scipy.optimize.nonlin.nonlin_solve . This really felt hacky.
Long story short, I defined a class inheriting from scipy.optimize.nonlin.Jacobian, where I defined "update" and "solve" method so that my exact jacobian would be used by the solver.
I expect performance results to greatly vary from problem to problem. Let me detail my experience for a ~10k dimensional critial point solve of an "almost" coercive function (i.e. the problem would be coercive if I had taken the time to remove a 4-dimensional symmetry generator), with many many local minima (and thus presumably many many critical points).
Long story short, this gave terrible results far from the optimum, but local convergence was achieved in fewer optimization cycles. The cost of each of those optimization cycle was (for my personal problem at hand) far greater than the "standard" krylov lgmres, so in the end even close to the optimum, I cannot really say it was worth the trouble.
To be honest, I am very impressed with the Jacobian finite difference approximation of the 'krylov' method of scipy.optimize.root.
I'm using detectron2 for solving a segmentation task,
I'm trying to classify an object into 4 classes,
so I have used COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml.
I have applied 4 kind of augmentation transforms and after training I get about 0.1
total loss.
But for some reason the accuracy of the bbox is not great on some images on the test set,
the bbox is drawn either larger or smaller or doesn't cover the whole object.
Moreover sometimes the predictor draws few bboxes, it assumes there are few different objects although there is only a single object.
Are there any suggestions how to improve it's accuracy?
Are there any good practice approaches how to resolve this issue?
Any suggestion or reference material will be helpful.
I would suggest the following:
Ensure that your training set has the object you want to detect in all sizes: in this way, the network learns that the size of the object can be different and less prone to overfitting (detector could assume your object should be only big for example).
Add data. Rather than applying all types of augmentations, try adding much more data. The phenomenon of detecting different objects although there is only one object leads me to believe that your network does not generalize well. Personally I would opt for at least 500 annotations per class.
The biggest step towards improvement will be achieved by means of (2).
Once you have a decent baseline, you could also experiment with augmentations.
I am working on fitting Weibull distribution on some integer data and estimating relevant shape, scale, location parameters. However, I noticed poor performance of scipy.stats library while doing so.
So, I took a different direction and checked the fit performance by using the code below. I first create 100 numbers using Weibull distribution with parameters shape=3, scale=200, location=1. Subsequently, I estimate the best distribution fit using fitter library.
from fitter import Fitter
import numpy as np
from scipy.stats import weibull_min
# generate numbers
x = weibull_min.rvs(3, scale=200, loc=1, size=100)
# make them integers
data = np.asarray(x, dtype=int)
# fit one of the four distributions
f = Fitter(data, distributions=["gamma", "rayleigh", "uniform", "weibull_min"])
f.fit()
f.summary()
I expect the best fit to be Weibull distribution. I have tried re-running this test. Sometimes Weibull fit is a good estimate. However, most of the time Weibull fit is reported as the worst result. In this case, the estimated parameters are = (0.13836651040093312, 66.99999999999999, 1.3200752378443505). I assume these parameters correspond to shape, scale, location in order. Below is the summary of the fit procedure.
$ f.summary()
sumsquare_error aic bic kl_div
gamma 0.001601 1182.739756 -1090.410631 inf
rayleigh 0.001819 1154.204133 -1082.276256 inf
uniform 0.002241 1113.815217 -1061.400668 inf
weibull_min 0.004992 1558.203041 -976.698452 inf
Additionally, the following plot is produced.
Also, Rayleigh distribution is a special case of Weibull with shape parameter = 2. So, I expect the resulting Weibull fit to be at least as good as Rayleigh.
Update
I ran the tests above on Linux/Ubuntu 20.04 machine with numpy version 1.19.2 and scipy version 1.5.2. The code above seems to run as expected and return proper results for Weibull distribution on a Mac machine.
I have also tested fitting a Weibull distribution on data x generated above on the Linux machine by using an R library fitdistrplus as:
fit.weib <- fitdist(x, "weibull")
and observed that the estimated shape and scale values are found to be very close to the initially given values. The best guess so far is that the problem is due to some Python-Ubuntu bug/incompatibility.
I can be considered as a newbie in this area. So, I am wondering, am I doing something wrong here? Or is this result somehow expected? Any help is greatly appreciated.
Thank you.
Library fitter doesn't allow to specify parameters for distributions such as a, loc, etc. And strangely, Mac produces better fit while Linux heavily pains the results for best fit, for the same version of Numpy and Scipy. Underlying reasons may include different BLAS-LAPACK algorithms designed for Linux and Mac, https://stackoverflow.com/a/49274049/6806531, or weibull_min may not initialize parameter a = 1 which is discussed online, or default floating-point accuracy. However, one can solve the error inside fitter library. Knowing the fact that weib_min is expon_weib with parameter a is fixed as 1, changing the run function inside of _timed_run function in fitter.py as
def run(self):
try:
if distribution == "exponweib":
self.result = func(args,floc=0,fa = 1, **kwargs)
else:
self.result = func(args, floc=0, **kwargs)
except Exception as err:
self.exc_info = sys.exc_info()
and using exponweib as weib_min gives nearly same results as R fitdist.
I am not familiar with the Fitter library, but in order to draw some conclusions I would suggest:
Retry your code, but by taking size=10,000. In this case, there are sufficient datapoints for the fitting methods to utilize. Theoretically, you would then expect the Weibull to deliver the best fit.
I noticed that the location parameter can sometimes be a pain. You could try to run your fits by fixing the location parameter with floc=1 (i.e. equal to your sampling parameter for location). What do you get? Aditionally, FYI, with MLE, it suffices to take loc=min(x), where x is your dataset. For the exponential distribution, this in fact the MLE of the location parameter. For other distributions I am not sure, but I wouldn't be surprised if this holds for other distributions as well. This would reduce the fitting procedure with 1 parameter.
Lastly, I noticed that if you take small values for location/scale/shape for some distributions, the functions logpdf and logcdf of scipy.stats distributions result in np.inf values. In this scenario, you could perhaps use the Powell optimization algorithm and set bounds on the values of your parameters.
I try to fit data using standard defined functions (Lorentzian & Gaussian) from lmfit package. The program works quite well for some data set but for another one its not able to fit because the initial values doesnt seem right. Is there any algorithm which can extract the initial values from the data set and do some iterations in order to find the best fit?
I tried some common methods like bruethe-force algorithm but the results are not satisfactory and it cost a lot of time.
It is always recommended to provide a small, complete example script that shows the problem you are having. How could we know why it works in some cases and not in others?
lmfit.GaussianModel and lmfit.LorentzianModel both have guess methods. This should work reasonably well data with an isolated peak, working like
import lmfit
model = lmfit.models.GaussianModel()
params = model.guess(ydata, x=xdata)
for p in params.values():
print(p)
result = model.fit(ydata, params, x=xdata)
print(result.fit_report())
If the data doesn't have a clear isolated peak, that might not work so well.
If finding the peak(s) is the actual problem, try scipy.signal.find_peaks
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html)or peakutils (https://peakutils.readthedocs.io/en/latest/). Either of these should give you a good estimate of the center parameter, which is probably the most likely to cause bad fits if a poor initial value is give.
I'm normalizing my data to zero mean and unit variance as recommended in most literature to pre-train a GB-RBM. But whatever learning rate I choose and whatsoever is the number of epochs, my mean reconstruction error never drops below around 0.6.
Reconstruction errors for the stacked BB-RBMs easily drop to 0.01 within a few epochs. I've used several toolkits which implement GBRBMs as mentioned in http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf but all have the same issue. Am I missing something or is the reconstruction error meant to stay above 50% ?
I'm normalizing my data by subtracting mean and dividing by the standard deviation along each dimension of input vector:
size(mfcc) --> [mlength rows x 39 cols]
mmean=mean(mfcc);
mstd=std(mfcc);
mfcc=mfcc-ones(mlength,1)*mmean;
mfcc=mfcc./(ones(mlength,1)*mstd);
This does give me zero mean and unit var along each dimension. I have tried different datasets, different features and different toolkits, but my reconstr error never drops below 0.6 for GBRBMs.
Thanks
I would guess you are using exp() as the sigmoid and then using a 3rd party library to do the matrix functions?
if the above is true, I would guess the 3rd party library is swallowing the exp() overflow errors but still stopping the calculation and so the hidden/recreated vectors are invalid.
edit based on comment below:
theano.tensor.nnet.sigmoid() is using exp() so I would first try switching to hard_sigmoid(). It won't be as nice of a curve, but it won't overflow/underflow so you can see if that is the source of error.
I assume you tried other data preprocessing and still had the high reconstruction errors?