Just a quick check for a question I have.
I want to build a model that generates its output based on two models F and G like so.
y = G(F(x))
where x is of course the input, and y the output.
However, first I want to update the weights of the F(x) and then later update the weights of both F and G based on the value of y.
I understand that pytorch offers a way to specify your own backprop method.. but since my "method" seems to build out of basic components, could it be that I can do this with a standard solution?
My thoughts would be that I need separate optimizer/loss for the F and G objects. But in addition to that, also some update functionality for the composite model G(F()). Can anyone confirm this as well?
If as you suggest, the optimizers and losses for F and G can be separated, then I don't think that it will be necessary to implement any different update functionalities since you can specify the set of parameters for each optimizer, e.g.
optimizer_F = optim.SGD(F.parameters(),...)
optimizer_G = optim.SGD(G.parameters(),...)
then when you call optimizer_F.step() it will only update the parameters of F and similarly optimizer_G.step() will only update the parameters of G.
Related
I'm using spatstat to run some mppm models and would like to be able to calculate standard errors for the predictions as in predict.ppm. I could use predict.ppm on each point process individually of course, but I'm wondering if this in invalid for any reason or if there is a better way of doing so?
This is not yet implemented as an option in predict.mppm. (It is on our long list of things to do. I will move it closer to the top of the list.)
However, it is available by applying predict.ppm to each element of subfits(model), where model was the original fitted model of class mppm. Something like:
m <- mppm(......)
fits <- subfits(m)
Y <- lapply(fits, predict, se=TRUE)
Just to clarify, fits[[i]] is a point process model, of class ppm, for the data in row i of the data hyperframe, implied by the big model m. The parameter estimates and variance estimates in fits[[i]] are based on information from the entire hyperframe. This is not the same as fitting a separate model of class ppm to the data in each row of the hyperframe and calculating predictions and standard errors for those fits.
I'm trying to fit a simple Bayesian regression model to some right-skewed data. Thought I'd try setting family to a log-normal distribution. I'm using pymc3 wrapper BAMBI. Is there a way to build a custom family with a log-normal distribution?
It depends on what you want the mean function of the model to look like.
If you want a model like
then Yes, this is easily achieved by simply log transforming Y and then estimating the usual linear model with Normal response. Notice that in this model Y is an exponential function of the predictor X, so when plotting Y against X (both untransformed), the regression line can curve up or down. It also has a multiplicative error term so that the variance is greater for larger predicted Y values. We can say that such a model has a log link function and a lognormal response.
But if you want a model like
then No, this kind of model is not currently supported by bambi*. This is a model with a lognormal response but an identity link function. The regression of Y on X is a straight line, but the errors have the same lognormal distribution at every point along X, so that the variance does not increase for larger predicted Y values. Note that this is an unusual model that I personally have never actually seen used.
* It's possible in theory to roll your own custom Families (although it would require some slight hacking), but the way this is designed in bambi ultimately depends on the families implemented in statsmodels.genmod, which does not currently include lognormal.
Unless I'm misunderstanding something, I think all you need to do is specify link='log' in the fit() call. If your assumption is correct, the exponentiated linear prediction will be normally distributed, and the default error distribution is gaussian, so I don't think you need to build a custom family for this—the default gaussian family with a log link should work fine. But feel free to clarify if this doesn't address your question.
I would like to know how to take gradient steps for the following mathematical operation in PyTorch (A, B and C are PyTorch modules whose parameters do not overlap)
This is somewhat different than the cost function of a Generative Adversarial Network (GAN), so I cannot use examples for GANs off the shelf, and I got stuck while trying to adapt them for the above cost.
One approach I thought of is to construct two optimizers. Optimizer opt1 has the parameters for the modules A and B, and optimizer opt2 has the parameters of module C. One can then:
take a step for minimizing the cost function for C
run the network again with the same input to get the costs (and intermediate outputs) again
take a step with respect to A and B.
I am sure they must be a better way to do this with PyTorch (maybe using some detach operations), possibly without running the network again. Any help is appreciated.
Yes it is possible without going through the network two times, which is both wasting resources and wrong mathematically, since the weights have changed and so the lost, so you are introducing a delay doing this, which may be interesting but not what you are trying to achieve.
First, create two optimizers just as you said. Compute the loss, and then call backward. At this point, the gradient for the parameters A,B,C have been filled, so now you can just have to call the step method for the optimizer minimizing the loss, but not for the one maximizing it. For the later, you need to reverse the sign of the gradient of the leaf parameter tensor C.
def d(y, x):
return torch.pow(y.abs(), x + 1)
A = torch.nn.Linear(1,2)
B = torch.nn.Linear(2,3)
C = torch.nn.Linear(2,3)
optimizer1 = torch.optim.Adam((*A.parameters(), *B.parameters()))
optimizer2 = torch.optim.Adam(C.parameters())
x = torch.rand((10, 1))
loss = (d(B(A(x)), x) - d(C(A(x)), x)).sum()
optimizer1.zero_grad()
optimizer2.zero_grad()
loss.backward()
for p in C.parameters():
if p.grad is not None: # In general, C is a NN, with requires_grad=False for some layers
p.grad.data.mul_(-1) # Update of grad.data not tracked in computation graph
optimizer1.step()
optimizer2.step()
NB: I have not checked mathematically if the result is correct but I assume it is.
From the sklearn-style API of XGBClassifier, we can provide eval examples for early-stopping.
eval_set (list, optional) – A list of (X, y) pairs to use as a
validation set for early-stopping
However, the format only mentions a pair of features and labels. So if the doc is accurate, there is no place to provide weights for these eval examples.
Am I missing anything?
If it's not achievable in the sklearn-style, is it supported in the original (i.e. non-sklearn) XGBClassifier API? A short example will be nice, since I never used that version of the API.
As of a few weeks ago, there is a new parameter for the fit method, sample_weight_eval_set, that allows you to do exactly this. It takes a list of weight variables, i.e. one per evaluation set. I don't think this feature has made it into a stable release yet, but it is available right now if you compile xgboost from source.
https://github.com/dmlc/xgboost/blob/b018ef104f0c24efaedfbc896986ad3ed1b66774/python-package/xgboost/sklearn.py#L235
EDIT - UPDATED per conversation in comments
Given that you have a target-variable representing real-valued gain/loss values which you would like to classify as "gain" or "loss", and you would like to make sure the validation-set of the classifier weighs the large-absolute-value gains/losses heaviest, here are two possible approaches:
Create a custom classifier which is just XGBoostRegressor fed to a treshold where the real-valued regression predictions are converted to 1/0 or "gain"/"loss" classifications. The .fit() method of this classifier would just call .fit() of xgbregressor, while .predict() method of this classifier would call .predict() of the regressor and then return the thresholded category predictions.
you mentioned you would like to try weighting the treatment of the records in your validation set, but there is no option for this in xgboost. The way to implement this would be to implement a custom eval-metric. However, you pointed out that eval_metric must be able to return a score for a single label/pred record at a time, so it couldn't accept all your row-values and perform the weighting in the eval metric. The solution to this you mentioned in your comment was "create a callable which has a ref to all validation examples, pass the indices (instead of labels and scores) into eval_set, use the indices to fetch labels and scores from within the callable and return metric for each validation examples." This should also work.
I would tend to prefer option 1 as more straightforward, but trying two different approaches and comparing results is generally a good idea if you have the time, so interested how these turn out for you.
In the scikit-learn kmeans source code, there is an optional argument y that can be specified (transform(X[, y])); however when I examined the source code for transform, it seems that nowhere does it deal with y in the case that it is specified. What is the purpose of this optional argument (it is not clear in the documentation either)?
As an addendum; I was wondering if there was any way to specify the centroids in the transform function if they're already computed previously. (Or if there was any other function to do this in scikit-learn).
Centroid specification
You could just overwrite kmeans_object.cluster_centers_ with your own centroids. But it might be better just using init with these centers and do some iterations.
See the available attributes in the docs.
To answer your first question about the seemingly pointless argument y. You are correct, in many cases, Scikit-Learn allows users to pass a y argument that actually doesn't affect the outcome of the method.
As explained in their documentation:
y might be ignored in the case of unsupervised learning. However, to
make it possible to use the estimator as part of a pipeline that can
mix both supervised and unsupervised transformers, even unsupervised
estimators need to accept a y=None keyword argument in the second
position that is just ignored by the estimator. For the same reason,
fit_predict, fit_transform, score and partial_fit methods need to
accept a y argument in the second place if they are implemented.
So it's all to make the code easier to write. Imagine that you have a pipeline that looks like this:
step 0: some normalization
step 1: K-means to transform the data in another space
step 2: classification step
Step 1 obviously doesn't need y to work, but if you have to write the code to make the pipeline apply all of these steps, it'll be easier to simply pass X, y into all transformers, rather than having to worry about whether each individual transformer takes a y or not