What do we mean by 'register' in PyTorch? - pytorch

I am not asking about registers which are memory locations to store the content.
I am asking about the usage of word 'register' in PyTorch documentation.
While I am reading the documenation regarding MODULE in PyTorch, I encountered the usage of word registers, registered several times.
The context of usage is as follows
1. tensor (Tensor) – buffer to be registered.
2. Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.
3. Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
4. Registers a backward hook on the module.
5. Registers a forward hook on the module.
.....
And the word 'register' has been used in the names of several methods
1. register_backward_hook(hook)
2. register_buffer(name, tensor, persistent=True)
3. register_forward_hook(hook)
4. register_forward_pre_hook(hook)
5. register_parameter(name, param)
......
What does it mean by the usage of word register programmatically?
Does it just mean the act of recording a name or information on an official list as in plain English or has any significance programmatically?

This "register" in pytorch doc and methods names means "act of recording a name or information on an official list".
For instance, register_backward_hook(hook) adds the function hook to a list of other functions that nn.Module executes during the execution of the forward pass.
Similarly, register_parameter(name, param) adds an nn.Parameter param with name to the list of trainable parameters of the nn.Module.
It is crucial to register trainable parameters so pytorch will know what tensors to pass to the optimizer and what tensors to store as part of the nn.Module's state_dict.

Related

The meaning and implications of VK_DEPENDENCY_BY_REGION_BIT

An input attachment can be accessed by the subpassLoad GLSL function which samples the input attachment at the current fragment position, i.e. the interface doesn't provide random access. The consequence of this that input attachments cannot be accessed at arbitrary fragment locations.
This practically means [1]:
If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.
Then, about VK_DEPENDENCY_BY_REGION_BIT the specification says [2]:
If a synchronization command includes a dependencyFlags parameter, and specifies the VK_DEPENDENCY_BY_REGION_BIT flag, then it defines framebuffer-local dependencies for the framebuffer-space pipeline stages in that synchronization command, for all framebuffer regions. If no dependencyFlags parameter is included, or the VK_DEPENDENCY_BY_REGION_BIT flag is not specified, then a framebuffer-global dependency is specified for those stages.
Hans-Kristian Arntzen from ARM [3] suggests that on tiled architectures multi-subpass renderpasses should be used only in conjuction with VK_DEPENDENCY_BY_REGION_BIT:
Next, we try to merge adjacent render passes together. This is particularly important on tile-based renderers. We try to merge passes together if:
They are both graphics passes
They share some color/depth/input attachments
Not more than one unique depth/stencil attachment exists
Their dependencies can be implemented with BY_REGION_BIT, i.e. no “texture” dependency, which allows sampling for arbitrary locations.
Now the questions are:
If you cannot access fragments outside of the current fragment location anyway, what is the point of VK_DEPENDENCY_BY_REGION_BIT?
On tiled architectures does a multi-subpass render pass where subpass dependencies cannot be declared with VK_DEPENDENCY_BY_REGION_BIT provide any performance advantage over functionally equivalent properly-synchronized series of separate single-subpass render passes?
Well, the specification gives one example. If you want to access a sample of the input attachment that is not covered by the fragment, then you have to use framebuffer-global dependency (i.e. dependencyFlags = 0, or one of the vendor extension fixes that).
Though the most obvious example are non-attachment resources, which are naturally random access (where you can access any pixel). With VK_DEPENDENCY_BY_REGION_BIT only the part that was written for the same fragment can ever be certain to be visible. While with framebuffer-global dependency (dependencyFlags=0), you could access a location in a storage buffer written by any fragment shader invocation of the previous subpass.
dependencyFlags=0 is sort of a soft-restart of the Render Pass. So everything being the same I would grade the performance this way:
single Subpass ≥ multiple Subpasses with VK_DEPENDENCY_BY_REGION_BIT ≥ multiple subpasses without VK_DEPENDENCY_BY_REGION_BIT ≥ multiple render passes.
Whether framebuffer-global subpasses actually provide any performance advantage I cannot say without measurement of a particular implementation (and that would potentially be a perishable information, changing with new GPUs, or perhaps even driver versions). Though the case should not be worse than a separate render pass, which would likely be the worst demotion the driver itself would do if it cannot do anything with those special subpasses.

Specify fixed parameters and parameters to be search in optuna (lightgbm)

I just found Optuna and it seems they are integrated with lightGBM, but I struggle to see where I can fix parameters, e.g scoring="auc" and where I can define a gridspace to search, e.g num_leaves=[1,2,5,10].
Using https://github.com/optuna/optuna/blob/master/examples/lightgbm_tuner_simple.py as example, they just define a params dict with some fixed parameters (are all parameters not specified in that dict tuned?), and the documentation states that
It tunes important hyperparameters (e.g., min_child_samples and feature_fraction) in a stepwise manner
How can I controll which parameters are tuned and in what space, and how can I fix some parameters?
I have no knowledge of LightGBM, but since this is the first result for fixing parameters in optuna, I'll answer that part of the question:
In optuna, the search space is defined within the code of the objective function. This function should take a 'trials' object as an input, and you can create parameters by calling the suggest_float(), suggest_int() etc. functions on that trials object. For more information, see the documentation at 10_key_features/002_configurations.html
Generally, fixing a parameter is done by hardcoding it instead of calling a suggest function, but it is possible to fix specific parameters externally using the PartialFixedSampler

How PyTorch implements Convolution Backward?

I read about the Pytorch's source code, and I find it's weird that it doesn't implement the convolution_backward function, The only convolution_backward_overrideable function is directly raises an error and supposed not to fall here.
So I referred to CuDNN / MKLDNN implementation, they both implements functions like cudnn_convolution_backward.
I got the following question:
What are the native implementation of CUDA/ CPU? I can find something like thnn_conv2d_backward_out, but I could not find where are this is called.
Why PyTorch didn't put the convolution_backward function in Convolution.cpp? It offers an _convolution_double_backward() function. But this is the double backward, it's the gradient of gradient. Why don't they offer a single backward function?
If I want to call the native convolution/ convolution_backward function for my pure cpu/cuda tensor, how should I write code? Or where could I refer to? I couldn't find example for this.
Thanks !
1- Implementation may differ depending on which backend you use, it may use CUDA convolution implementation from some library, CPU convolution implementation from some other library, or custom implementation, see here: pytorch - Where is “conv1d” implemented?.
2- I am not sure about the current version, but single backward was calculated via autograd, that is why there was not an explicit different function for it. I don't know the underlying details of autograd but you can check https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/autograd.cpp. That double_backward function is only there if you need higher order derivatives.
3- If you want to do this in C, the file you linked (convolution.cpp) shows you how to do this (function at::Tensor _convolution...). If you inspect the function you see it just checks which implementation to use (params.use_something...) and use it. If you want to do this in python you should start tracing from conv until where this file convolution.cpp is called.
I have figured something addition to #unlut's post.
The convolution method are in separate files for different implementations. You may find cudnn_convoluton_backward or mkldnn_convolution_backward easily. One tricky thing is that the final native fall function is hard to find. It is because currently Pytorch Teams are porting Thnn function to ATen, you could refer to PR24507.
The native function could be find as thnn_con2d_backward.
The convolution backward is not calculated via autograd, rather, there must a conv_backward function and this must be recorded in derivatives.yaml. If you want to find specific backward function, refer to that file is a good start.
About this code, if you want to directly call thnn_backward function, you need to explicitly construct finput and fgrad_input. These are two empty tensor offering as a buffer.
at::Tensor finput = at::empty({0},input.options());
at::Tensor fgrad_input = at::empty({0}, input.options());
auto kernel_size = weight.sizes().slice(2);
auto &&result = at::thnn_conv2d_backward(grad_output, input, weight,kernel_size , stride, padding,
finput, fgrad_input, output_mask);

Keyword arguments in torch.nn.Sequential (pytroch)

a question regarding keywords in torch.nn.Sequential, it is possible in some way to forward keywords to specific models in a sequence?
model = torch.nn.Sequential(model_0, MaxPoolingChannel(1))
res = model(input_ids_2, keyword_test=mask)
here, keyword_test should be forwarded only to the first model.
Thank a lot and best regards!
my duplicate from - https://discuss.pytorch.org/t/keyword-arguments-in-torch-nn-sequential/53282
No; you cannot. This is only possible if all models passed to the nn.Sequential expects the argument you are trying to pass in their forward method (at least at the time of writing this).
Two workarounds could be (I'm not aware of the whole case, but anticipated from the question):
If your value is static, why not initializing your first model with that value, and access it during computation with self.keyword_test.
In case the value is dynamic, you could have it as an inherent property in the input; hence, you can access it, also, during computation with input_ids_2.keyword_test

writing own data reader for tensorflow producing tensors directly

I like to write a new data reader for tensorflow which produces multiple feature/label tensors directly w/o decoding the data from a string. I looked at the new_data_formats tutorial yet I do not want my own reader class to interact through the interface
Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
since I am producing tensors directly. The reader should take a filename from a filename queue and produce multiple tensors (depending on the filesize) which are then enqueued into a random batch queue. My question is: From which class should my reader inherit to produce the tensors? I think it is not sufficient to implement this simply as a new op due to thread safety. I noticed that the resource_op_kernel class maybe a suitable starting point.
Since this is quite deep inside tensorflow any pointers or additional hints where to start and what pitfalls may lie ahead are helpful (specifically some explanations about resource management, custom ops & thread-safety inside tensorflow).
It sounds like the implementation of tf.data.FixedLengthRecordDataset would be a good place to start (C++ op implementation here). That already takes a filename and returns Tensors directly, so it sounds like you'd just want to output several Tensors rather than one.
Granted that this is using tf.data rather than queues. Probably a good idea regardless.

Resources