GridSearchCV fails for own model class - python-3.x

I'm trying to use a regression model I have implemented in combination with the GridSearchCV class of scikit-learn to optimize the hyper-parameters of my model. My modelclass is nicely build following the suggestions of the scikit-api:
class FOO(BaseEstimator, RegressorMixin):
def __init__(self,...)
*** initialisation of all the parameters and hyperparameters (including the kernelfunction)***
def fit(self,X,y)
*** implementation of fit: just takes input and performs fit of parameters.
def predict(self,X)
*** implementation of predict: just takes input and calculates the result
The regression-class works as it should, but strangely enough, when I study the behavior of the hyperparameters, I tend to get inconsistencies. It appears one hyper-parameter is correctly applied by GridSearchCV, but the other one is clearly not.
So I am wondering, can someone explain to me how gridsearchCV is working (from the technical perspective)? How does it initialise the estimator, how does it run it over the grid?
My current assumption of the workings and required use of GridsearchCV is this:
Create a GridSearchCV instance (CVmodel=GridSearchCV(MyRegressor,param_grid=Myparamgrid,...)
Fit the hyperparameter(s) via: CVmodel.fit(X,y). Which naively would work like this:
> Loop over Parameter-values
> - create esimator instance with parameter value(and defaults for the other params)
> - estimator.fit
> - result[parameter-value]=estimator.predict
However, experience shows me this naive idea is quite wrong, as the hyper-parameter associated with the kernel-function of my regressor is not correctly initialized.
Can anyone provide some insight into what GridSearchCV is truly doing?

After quite some digging I discovered, scikit-learn does not create new instances (as would be expected in OOP) but rather updates the properties of the object via the set_params method. In my case, this worked fine for the hyperparameter which is directly defined by the same keyword in the __ init __ method, however, it breaks down when the hyperparameter is a property of the static method set during the __ init __ method. Overriding the set_params method (which many tutorials advise against) to deal with this fixes the problem.
For those interested in more details, I wrote this all up in a tutorial myself.

Related

Adding Custom Loss for Validation to the Keras.History object

I have a loss function which includes several contributions, i.e.
L=L1+L2+... .
I am in particular interested in the individual development of L1,L2... on both the training and validation data set during learning.
If I generate my model via subclassing (and Functional API) and perform the training via model.fit(), how can I add the validation losses maybe called "val_L1", "Val_L2"... to the History-Object?
Thanks for any help
I figured it out by myself. I hope this will help someone in future who is also struggling with this issue.
If you define your customized model as the subclass of tf.keras.Model you have to use functions "train_step" and "test_step" via
def train_step (....): and def test_step (...):.
"train_step" is the function which is used to describe the training procedure according model.fit().
if both functions return:
return {'L1':L1, 'L2':L2}
the history-object will contain automatically 'val_L1' and 'val_L2'

example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html

How to save model architecture in PyTorch?

I know I can save a model by torch.save(model.state_dict(), FILE) or torch.save(model, FILE). But both of them don't save the architecture of model.
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ? I want to apply different tweaks to my model. Do I have any better way than copying the whole class definition every time and creating a new class if I can't save the architecture of a model?
You can refer to this article to understand how to save the classifier. To make a tweaks to a model, what you can do is create a new model which is a child of the existing model.
class newModel( oldModelClass):
def __init__(self):
super(newModel, self).__init__()
With this setup, newModel has all the layers as well as the forward function of oldModelClass. If you need to make tweaks, you can define new layers in the __init__ function and then write a new forward function to define it.
Saving all the parameters (state_dict) and all the Modules is not enough, since there are operations that manipulates the tensors, but are only reflected in the actual code of the specific implementation (e.g., reshapeing in ResNet).
Furthermore, the network might not have a fixed and pre-determined compute graph: You can think of a network that has branching or a loop (recurrence).
Therefore, you must save the actual code.
Alternatively, if there are no branches/loops in the net, you may save the computation graph, see, e.g., this post.
You should also consider exporting your model using onnx and have a representation that captures both the trained weights as well as the computation graph.
Regarding the actual question:
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ?
The answer is: You cannot
Is there any way to load a trained model without declaring the class definition before ?
I want the model architecture as well as parameters to be loaded.
no, you have to load the class definition before, this is a python pickling limitation.
https://discuss.pytorch.org/t/how-to-save-load-torch-models/718/11
Though, there are other options (probably you have already seen most of those) that are listed at this PyTorch post:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
PyTorch's way of serializing a model for inference is to use torch.jit to compile the model to TorchScript.
PyTorch's TorchScript supports more advanced control flows than TensorFlow, and thus the serialization can happen either through tracing (torch.jit.trace) or compiling the Python model code (torch.jit.script).
Great references:
Video which explains this: https://www.youtube.com/watch?app=desktop&v=2awmrMRf0dA
Documentation: https://pytorch.org/docs/stable/jit.html

XGBoost get classifier object form booster object?

I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.

Using PyTorch for scientific computation

I would like to use PyTorch as a scientific computation package. It has much to recommend it in that respect - its Tensors are basically GPU-accelerated numpy arrays, and its autograd mechanism is potentially useful for a lot of things besides neural networks.
However, the available tutorials and documentation seem strongly geared towards quickly getting people up and running using it for machine learning. Although there is lots of good information available on the Tensor and Variable classes (and I understand that material reasonably well), the nn and optim packages always seem to be introduced by example rather than by explaining the API, which makes it hard to figure out exactly what's going on.
My main question at this point is whether I can use the optim package without also using the nn package, and if so how to do so. Of course I can always implement my simulations as subclasses of nn.Module even though they are not neural networks, but I would like to understand what happens under the hood when I do this, and what benefits/drawbacks it would give for my particular application.
More broadly, I would appreciate pointers to any resource that gives more of a logical overview of the API (for nn and optim specifically), rather than just presenting examples.
This is a partial self-answer to the specific question about using optim without using nn. The answer is, yes, you can do that. In fact, from looking at the source code, the optim package doesn't know anything about nn and only cares about Variables and tensors.
The documentation gives the following incomplete example:
optimizer = optim.Adam([var1, var2], lr = 0.0001)
and then later:
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
The function model isn't defined anywhere and looks like it might be something to do with nn, but in fact it can just be a Python function that computes output from input using var1 and var2 as parameters, as long as all the intermediate steps are done using Variables so that it can be differentiated. The call to optimizer.step() will update the values of var1 and var2 automatically.
In terms of the structure of PyTorch overall, it seems that optim and nn are independent of one another, with nn being basically just a convenient way to chain differentiable functions together, along with a library of such functions that are useful in machine learning. I would still appreciate pointers to a good technical overview of the whole package, though.

Resources