user defined loss function liblinear - svm

In the Java version of LIBLINEAR there is a class called 'SolverType' in which one can choose type of the loss function to which they want to optimize the function. For example 'SolverType.L2LOSS_SVM_DUAL'. Is there any way to define a user-defined loss function?

The short answer is no.
The "loss function" defines the optimization problem, in fact this parameter changes (in particular) this model to
linear regression
logistic regression
support vector machine
While first two are quite similar, third requires completely different machinery to solve it, much more complex methods. In particular one can define very arbitrary functions, which fall into "linear models" category, which are unsolvable (are solvable by very complex techniques).
On the other hand, if the function is very simple, ie. it is a differentiable function, without any bounds (optimization is performed on the whole parameters space) then (assuming you know analytical form of the derivatives) you can plug it in into any steepest descent algorithm implementation (there are dozens of such solvers avaliable).

SVM is formulated as a QP problem.
minimize ||w|| w.r.t
y * (w'x) >= 1 for all (x, y) in the training dataset
This is the dual form of the problem and the objective is to minimize the L2 norm of the weight w.
If you change the objective ||w|| then it is no longer SVM. However, you can change the weight of training examples. You can find a tutorial here:
http://scikit-learn.org/stable/modules/svm.html#unbalanced-problems

Related

Use pytorch optimizer to fit a user defined function

I have read many tutorials on how to use PyTorch to make a regression over a data set, using, for instance, a model composed of several linear layers and the MSE loss.
Well, imagine that I know the function F depends on a variable x and some unknown parameters (p_j: j=0,..., P-1) with P relatively small, but the function is a composition of special function. So, my problem is the classical minimization knowing the data {x_i,y_i}_i<=N
Min_{ {p_j} } Sum_i (F(x_i;{p_j}) - y_i)^2
I would like to know if I can use the PyTorch optimizers and if yes how I can do it?
Thanks.
In fact, PyTorch experts answer that the function to minimized must be expressed in terms of torch.tensors to let the minimizers computing the gradients. So, it is not possible in my case.

Specify log-normal family in BAMBI model

I'm trying to fit a simple Bayesian regression model to some right-skewed data. Thought I'd try setting family to a log-normal distribution. I'm using pymc3 wrapper BAMBI. Is there a way to build a custom family with a log-normal distribution?
It depends on what you want the mean function of the model to look like.
If you want a model like
then Yes, this is easily achieved by simply log transforming Y and then estimating the usual linear model with Normal response. Notice that in this model Y is an exponential function of the predictor X, so when plotting Y against X (both untransformed), the regression line can curve up or down. It also has a multiplicative error term so that the variance is greater for larger predicted Y values. We can say that such a model has a log link function and a lognormal response.
But if you want a model like
then No, this kind of model is not currently supported by bambi*. This is a model with a lognormal response but an identity link function. The regression of Y on X is a straight line, but the errors have the same lognormal distribution at every point along X, so that the variance does not increase for larger predicted Y values. Note that this is an unusual model that I personally have never actually seen used.
* It's possible in theory to roll your own custom Families (although it would require some slight hacking), but the way this is designed in bambi ultimately depends on the families implemented in statsmodels.genmod, which does not currently include lognormal.
Unless I'm misunderstanding something, I think all you need to do is specify link='log' in the fit() call. If your assumption is correct, the exponentiated linear prediction will be normally distributed, and the default error distribution is gaussian, so I don't think you need to build a custom family for this—the default gaussian family with a log link should work fine. But feel free to clarify if this doesn't address your question.

Improving linear regression model by taking absolute value of predicted output?

I have a particular classification problem that I was able to improve using Python's abs() function. I am still somewhat new when it comes to machine learning, and I wanted to know if what I am doing is actually "allowed," so to speak, for improving a regression problem. The following line describes my method:
lr = linear_model.LinearRegression()
predicted = abs(cross_val_predict(lr, features, labels_postop_IS, cv=10))
I attempted this solution because linear regression can sometimes produce negative predictions values, even though my particular case, these predictions should never be negative, as they are a physical quantity.
Using the abs() function, my predictions produce a better fit for the data.
Is this allowed?
Why would it not be "allowed". I mean if you want to make certain statistical statements (like a 95% CI e.g.) you need to be careful. However, most ML practitioners do not care too much about underlying statistical assumptions and just want a blackbox model that can be evaluated based on accuracy or some other performance metric. So basically everything is allowed in ML, you just have to be careful not to overfit. Maybe a more sensible solution to your problem would be to use a function that truncates at 0 like f(x) = x if x > 0 else 0. This way larger negative values don't suddenly become large positive ones.
On a side note, you should probably try some other models as well with more parameters like a SVR with a non-linear kernel. The thing is obviously that a LR fits a line, and if this line is not parallel to your x-axis (thinking in the single variable case) it will inevitably lead to negative values at some point on the line. That's one reason for why it is often advised not to use LRs for predictions outside the "fitted" data.
A straight line y=a+bx will predict negative y for some x unless a>0 and b=0. Using logarithmic scale seems natural solution to fix this.
In the case of linear regression, there is no restriction on your outputs.
If your data is non-negative (as in your case the values are physical quantities and cannot be negative), you could model using a generalized linear model (GLM) with a log link function. This is known as Poisson regression and is helpful for modeling discrete non-negative counts such as the problem you described. The Poisson distribution is parameterized by a single value λ, which describes both the expected value and the variance of the distribution.
I cannot say your approach is wrong but a better way is to go towards the above method.
This results in an approach that you are attempting to fit a linear model to the log of your observations.

Learning method in keras? [duplicate]

While digging through the topic of neural networks and how to efficiently train them, I came across the method of using very simple activation functions, such as the rectified linear unit (ReLU), instead of the classic smooth sigmoids. The ReLU-function is not differentiable at the origin, so according to my understanding the backpropagation algorithm (BPA) is not suitable for training a neural network with ReLUs, since the chain rule of multivariable calculus refers to smooth functions only.
However, none of the papers about using ReLUs that I read address this issue. ReLUs seem to be very effective and seem to be used virtually everywhere while not causing any unexpected behavior. Can somebody explain to me why ReLUs can be trained at all via the backpropagation algorithm?
To understand how backpropagation is even possible with functions like ReLU you need to understand what is the most important property of derivative that makes backpropagation algorithm works so well. This property is that :
f(x) ~ f(x0) + f'(x0)(x - x0)
If you treat x0 as actual value of your parameter at the moment - you can tell (knowing value of a cost function and it's derivative) how the cost function will behave when you change your parameters a little bit. This is most crucial thing in backpropagation.
Because of the fact that computing cost function is crucial for a cost computation - you will need your cost function to satisfy the property stated above. It's easy to check that ReLU satisfy this property everywhere except a small neighbourhood of 0. And this is the only problem with ReLU - the fact that we cannot use this property when we are close to 0.
To overcome that you may choose the value of ReLU derivative in 0 to either 1 or 0. On the other hand most of researchers don't treat this problem as serious simply because of the fact, that being close to 0 during ReLU computations is relatively rare.
From the above - of course - from the pure mathematical point of view it's not plausible to use ReLU with backpropagation algorithm. On the other hand - in practice it usually doesn't make any difference that it has this weird behaviour around 0.

Can anyone explain me all the parameters of sklearn SVM.SVC in a simplified manner?

I am trying to lean SVC classifier of SVM model in sklearn. I have learned to use it on various datasets and even applied gridsearch to improve the results but I have not yet understood some parameters like C, gamma.
If anyone can give me simple but detail explanation of each parameter, it would be great.
Since we are trying to minimize some objective function, we can add some 'size' measure of the coefficient vector itself to the function. C is essentially the inverse of the weight on that 'regularization' term. Decreasing C will prevent overfitting by forcing the coefficients to be sparse or small, depending on the penalty. Increasing C too much will promote underfitting.
Gamma is a parameter for the RBF kernel. Increasing gamma allows for a more complex decision boundary (which can lead to overfitting, but can also improve results--it depends on the data).
This scikit-learn tutorial graphically shows the effect of changing both hyperparameters.

Resources