Use pytorch optimizer to fit a user defined function - pytorch

I have read many tutorials on how to use PyTorch to make a regression over a data set, using, for instance, a model composed of several linear layers and the MSE loss.
Well, imagine that I know the function F depends on a variable x and some unknown parameters (p_j: j=0,..., P-1) with P relatively small, but the function is a composition of special function. So, my problem is the classical minimization knowing the data {x_i,y_i}_i<=N
Min_{ {p_j} } Sum_i (F(x_i;{p_j}) - y_i)^2
I would like to know if I can use the PyTorch optimizers and if yes how I can do it?
Thanks.

In fact, PyTorch experts answer that the function to minimized must be expressed in terms of torch.tensors to let the minimizers computing the gradients. So, it is not possible in my case.

Related

Specify log-normal family in BAMBI model

I'm trying to fit a simple Bayesian regression model to some right-skewed data. Thought I'd try setting family to a log-normal distribution. I'm using pymc3 wrapper BAMBI. Is there a way to build a custom family with a log-normal distribution?
It depends on what you want the mean function of the model to look like.
If you want a model like
then Yes, this is easily achieved by simply log transforming Y and then estimating the usual linear model with Normal response. Notice that in this model Y is an exponential function of the predictor X, so when plotting Y against X (both untransformed), the regression line can curve up or down. It also has a multiplicative error term so that the variance is greater for larger predicted Y values. We can say that such a model has a log link function and a lognormal response.
But if you want a model like
then No, this kind of model is not currently supported by bambi*. This is a model with a lognormal response but an identity link function. The regression of Y on X is a straight line, but the errors have the same lognormal distribution at every point along X, so that the variance does not increase for larger predicted Y values. Note that this is an unusual model that I personally have never actually seen used.
* It's possible in theory to roll your own custom Families (although it would require some slight hacking), but the way this is designed in bambi ultimately depends on the families implemented in statsmodels.genmod, which does not currently include lognormal.
Unless I'm misunderstanding something, I think all you need to do is specify link='log' in the fit() call. If your assumption is correct, the exponentiated linear prediction will be normally distributed, and the default error distribution is gaussian, so I don't think you need to build a custom family for this—the default gaussian family with a log link should work fine. But feel free to clarify if this doesn't address your question.

Random Forest Regressor using a custom objective/ loss function (Python/ Sklearn)

I want to build a Random Forest Regressor to model count data (Poisson distribution). The default 'mse' loss function is not suited to this problem. Is there a way to define a custom loss function and pass it to the random forest regressor in Python (Sklearn, etc..)?
Is there any implementation to fit count data in Python in any packages?
In sklearn this is currently not supported. See discussion in the corresponding issue here, or this for another class, where they discuss reasons for that a bit more in detail (mainly the large computational overhead for calling a Python function).
So it could be done as discussed within the issues, by forking sklearn, implementing the cost function in Cython and then adding it to the list of available 'criterion'.
If the problem is that the counts c_i arise from different exposure times t_i, then indeed one cannot fit the counts, but one can still fit the rates r_i = c_i/t_i using MSE loss function, where one should, however, use weights proportional to the exposures, w_i = t_i.
For a true Random Forest Poisson regression, I've seen that in R there is the rpart library for building a single CART tree, which has a Poisson regression option. I wish this kind of algorithm would have been imported to scikit-learn.
In R, writing a custom objective function is fairly simple.
randomForestSRC package in R has provision for writing your own custom split rule. The custom split rule, however has to be written in pure C language.
All you have to do is, write your own custom split rule, register the split rule, compile and install the package.
The custom split rule has to be defined in the file called splitCustom.c in randomForestSRC source code.
You can find more info
here.
The file in which you define the split rule is
this.

How to get started with Tensorflow

I am pretty new to Tensorflow, and I am currently learning it through given website https://www.tensorflow.org/get_started/get_started
It is said in the manual that:
We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.
A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:"
q1."we don't know how good it is yet.", I didn't understand this
quote as the simple model created is a simple slope equation and on
what it should train for?, as the model is a simple slope. Is it
require an perfect slope or what? why am I training that model and
for what?
q2.what is a loss function? Is loss function is used to determine the
accuracy of the model? Why is it required?
q3. I didn't understand " 'sums the squares of the deltas' between
the current model and the provided data."
q4.I didn't understood this part of code,"squared_deltas =
tf.square(linear_model - y)
this is the code:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
this may be simple questions, but I am a beginner to Tensorflow and having a hard time understanding it.
1) So you're kind of right about "Why should we train for a simple problem" but this is just an introduction piece. With any machine learning task you need to evaluate your model to see how good it is. In this case you are just trying to train to find the coefficients for the line of best fit.
2) A loss function in any machine learning context represents your error with your model. This usually means a function of your "distance" of your calculated value to the ground truth value. Think of it as an internal evaluation score. You want to minimise your loss so the gradients and parameter changes are based on your loss.
3/4) Your question here is more to do with least square regression. It's a statistical method to create lines of best fit between points. The deltas represent the differences between your calculated values and the truth values. The aim is to minimise the area of the squares and hence minise the error and have a better line of best fit.
What you are doing in this Tensorflow example is creating a machine learning model that will learn the coefficients for the line of best fit automatically using a least squares based system.
Pretty much all of your question have to-do with the loss function.
The loss function is a function that determines how far apart your output are from the expected (correct) output.
It has two usages:
Help the algorithm determine if the tweaking of the weight is helping going in the good or bad direction
Determinate the accuracy (~the number of time your system guesses the correct answer)
The loss function is the sum of the deltas witch is: the addition of the diff (delta) between the expected output and the actual output.
I think It's squared to magnifies the error the algorithm makes.

Can anyone explain me all the parameters of sklearn SVM.SVC in a simplified manner?

I am trying to lean SVC classifier of SVM model in sklearn. I have learned to use it on various datasets and even applied gridsearch to improve the results but I have not yet understood some parameters like C, gamma.
If anyone can give me simple but detail explanation of each parameter, it would be great.
Since we are trying to minimize some objective function, we can add some 'size' measure of the coefficient vector itself to the function. C is essentially the inverse of the weight on that 'regularization' term. Decreasing C will prevent overfitting by forcing the coefficients to be sparse or small, depending on the penalty. Increasing C too much will promote underfitting.
Gamma is a parameter for the RBF kernel. Increasing gamma allows for a more complex decision boundary (which can lead to overfitting, but can also improve results--it depends on the data).
This scikit-learn tutorial graphically shows the effect of changing both hyperparameters.

user defined loss function liblinear

In the Java version of LIBLINEAR there is a class called 'SolverType' in which one can choose type of the loss function to which they want to optimize the function. For example 'SolverType.L2LOSS_SVM_DUAL'. Is there any way to define a user-defined loss function?
The short answer is no.
The "loss function" defines the optimization problem, in fact this parameter changes (in particular) this model to
linear regression
logistic regression
support vector machine
While first two are quite similar, third requires completely different machinery to solve it, much more complex methods. In particular one can define very arbitrary functions, which fall into "linear models" category, which are unsolvable (are solvable by very complex techniques).
On the other hand, if the function is very simple, ie. it is a differentiable function, without any bounds (optimization is performed on the whole parameters space) then (assuming you know analytical form of the derivatives) you can plug it in into any steepest descent algorithm implementation (there are dozens of such solvers avaliable).
SVM is formulated as a QP problem.
minimize ||w|| w.r.t
y * (w'x) >= 1 for all (x, y) in the training dataset
This is the dual form of the problem and the objective is to minimize the L2 norm of the weight w.
If you change the objective ||w|| then it is no longer SVM. However, you can change the weight of training examples. You can find a tutorial here:
http://scikit-learn.org/stable/modules/svm.html#unbalanced-problems

Resources