What is the difference between parameters and children? - pytorch

It looks like parameters and children show the same info, so what is the difference between them?
import torch
print('torch.__version__', torch.__version__)
m = torch.load('imagenet_resnet18.pth')
print(m.parameters)
print(m.children)

model.parameters() is a generator that returns tensors containing your model parameters.
model.children() is a generator that returns layers of the model from which you can extract your parameter tensors using <layername>.weight or <layername>.bias
Visit this link for a simple tutorial on accessing and freezing model layers.

The (only, during my writing) current answer is not to the point, and thus misleading in my own opinion. By the current docs(08/23/2022):
children():
Returns an iterator over immediate children modules.
This should mean that it will stop at non-leaf node like torch.nn.Sequential, torch.nn.ModuleList, etc.
parameters(recurse=True):
Returns an iterator over module parameters. This is typically passed to an optimizer.
"Passed to an optimizer" should imply that recursive cases are taken care by the team. Just pass the return value/object to the optimizer.
Since I know you're lazy developers, you must read this answer from PyTorch forum to see the output of children() done by someone: https://discuss.pytorch.org/t/module-children-vs-module-modules/4551/4?u=raining_day513

Related

How can I extract all arguments I am passing to a TensorFlow function?

It is difficult to retrain my models in new data because I never remember my initial optimizer, loss function, and hyperparameters. How can I extract all arguments I am passing to a TensorFlow function? Let's say from the code below, how to extract a list with the arguments learning_rate, beta_1, beta_2, and so on.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001,
beta_1=0.9,beta_2=0.999,
epsilon=1e-07, amsgrad=False,
name="Adam")
I just want to extract names thus I can later on call them by for example:
optimizer.learning_rate
I have try .keys(), .classes(), but nothing work. Of course I can inspect it using dir(optimizer) but the output is not filtered.
I just found a way. The drawback it requires compiling the model first. I will post it because maybe someone has the same issue.
model.optimizer.get_config()

pytorch MultiheadAttention - when and where can one use weights (the second output)?

I've read through a few MultiheadAttention tutorials now, and I consistently see the weights return value being ignored, ie:
x, _ = myattention(q,k,v)
I've also seen a need_weights parameter that can be used to omit the second output.
This got me curious about where/when the weights are useful. Is there a scenario where the weights can be used along with, or instead of, the output in a forward pass?

Build a pytorch model wrap around another pytorch model

Is it possible to wrap a pytorch model inside another pytorch module? I could not do it the normal way like in transfer learning (simply concatenating some more layers) because in order to get the intended value for the next 'layer', I need to wait the last layer of the first module to generate multiple outputs (say 100) and to use all those outputs to get the value for the next 'layer' (say taking the max of those outputs). I tried to define the integrated model as something like the following:
class integrated(nn.Module):
def __init__(self):
super(integrated, self)._init_()
def forward(self, x):
model = VAE(
encoder_layer_sizes=args.encoder_layer_sizes,
latent_size=args.latent_size,
decoder_layer_sizes=args.decoder_layer_sizes,
conditional=args.conditional,
num_labels=10 if args.conditional else 0).to(device)
device = torch.device('cpu')
model.load_state_dict(torch.load(r'...')) # the first model is saved somewhere else beforehand
model.eval()
temp = []
for j in range(100):
x = model(x)
temp.append(x)
y=max(temp)
return y
The reason I would like to do that is the library I need to use requires the input itself to be a pytorch module. Otherwise I could simply leave the last part outside of the module.
Yes you can definitely use a Pytorch module inside another Pytorch module. The way you are doing this in your example code is a bit unusual though, as external modules (VAE, in your case) are more often initialized in the __init__ function and then saved as attributes of the main module (integrated). Among other things, this avoids having to reload the sub-module every time you call forward.
One other thing that looks a bit funny is your for loop over repeated invocations of model(x). If there is no randomness involved in model's evaluation, then you would only need a single call to model(x), since all 100 calls will give the same value. So assuming there is some randomness, you should consider whether you can get the desired effect by batching together 100 copies of x and using a single call to model with this batched input. This ultimately depends on additional information about why you are calling this function multiple times on the same input, but either way, using a single batched evaluation will be a lot faster than using many unbatched evaluations.

How to stop training some specific weights in TensorFlow

I'm just beginning to learn TensorFlow and I have some problems with it.In training loop I want to ignore the small weights and stop training them. I've assigned these small weights to zero. I searched the tf API and found tf.Variable(weight,trainable=False) can stop training the weight. If the value of the weight is equal to zero I will use this function. I tried to use .eval() but there occurred an exception ValueError("Cannot evaluate tensor using eval(): No default ". I have no idea how to get the value of the variable when in training loop. Another way is to modify the tf.train.GradientDescentOptimizer(), but I don't know how to do it. Has anyone implemented this code yet or any other methods suggested? Thanks in advance!
Are you looking to apply regularization to the weights?
There is an apply_regularization method in the API that you can use to accomplish that.
See: How to exactly add L1 regularisation to tensorflow error function
I don't know any use-case for stopping training of some variables, probably it's not what you should do.
Anyway, calling tf.Variable() (if I got you right) is not going to help you, because it's called just once when the graph is defined. The first argument is initial_value: as the name suggests, it's assigned only during initialization.
Instead, you can use tf.assign like this:
with tf.Session() as session:
assign_op = var.assign(0)
session.run(assign_op)
It will update the variable during the session, which is what you're asking for.

How to provide weighted eval set to XGBClassifier.fit()?

From the sklearn-style API of XGBClassifier, we can provide eval examples for early-stopping.
eval_set (list, optional) – A list of (X, y) pairs to use as a
validation set for early-stopping
However, the format only mentions a pair of features and labels. So if the doc is accurate, there is no place to provide weights for these eval examples.
Am I missing anything?
If it's not achievable in the sklearn-style, is it supported in the original (i.e. non-sklearn) XGBClassifier API? A short example will be nice, since I never used that version of the API.
As of a few weeks ago, there is a new parameter for the fit method, sample_weight_eval_set, that allows you to do exactly this. It takes a list of weight variables, i.e. one per evaluation set. I don't think this feature has made it into a stable release yet, but it is available right now if you compile xgboost from source.
https://github.com/dmlc/xgboost/blob/b018ef104f0c24efaedfbc896986ad3ed1b66774/python-package/xgboost/sklearn.py#L235
EDIT - UPDATED per conversation in comments
Given that you have a target-variable representing real-valued gain/loss values which you would like to classify as "gain" or "loss", and you would like to make sure the validation-set of the classifier weighs the large-absolute-value gains/losses heaviest, here are two possible approaches:
Create a custom classifier which is just XGBoostRegressor fed to a treshold where the real-valued regression predictions are converted to 1/0 or "gain"/"loss" classifications. The .fit() method of this classifier would just call .fit() of xgbregressor, while .predict() method of this classifier would call .predict() of the regressor and then return the thresholded category predictions.
you mentioned you would like to try weighting the treatment of the records in your validation set, but there is no option for this in xgboost. The way to implement this would be to implement a custom eval-metric. However, you pointed out that eval_metric must be able to return a score for a single label/pred record at a time, so it couldn't accept all your row-values and perform the weighting in the eval metric. The solution to this you mentioned in your comment was "create a callable which has a ref to all validation examples, pass the indices (instead of labels and scores) into eval_set, use the indices to fetch labels and scores from within the callable and return metric for each validation examples." This should also work.
I would tend to prefer option 1 as more straightforward, but trying two different approaches and comparing results is generally a good idea if you have the time, so interested how these turn out for you.

Resources