How PyTorch implements Convolution Backward? - pytorch

I read about the Pytorch's source code, and I find it's weird that it doesn't implement the convolution_backward function, The only convolution_backward_overrideable function is directly raises an error and supposed not to fall here.
So I referred to CuDNN / MKLDNN implementation, they both implements functions like cudnn_convolution_backward.
I got the following question:
What are the native implementation of CUDA/ CPU? I can find something like thnn_conv2d_backward_out, but I could not find where are this is called.
Why PyTorch didn't put the convolution_backward function in Convolution.cpp? It offers an _convolution_double_backward() function. But this is the double backward, it's the gradient of gradient. Why don't they offer a single backward function?
If I want to call the native convolution/ convolution_backward function for my pure cpu/cuda tensor, how should I write code? Or where could I refer to? I couldn't find example for this.
Thanks !

1- Implementation may differ depending on which backend you use, it may use CUDA convolution implementation from some library, CPU convolution implementation from some other library, or custom implementation, see here: pytorch - Where is “conv1d” implemented?.
2- I am not sure about the current version, but single backward was calculated via autograd, that is why there was not an explicit different function for it. I don't know the underlying details of autograd but you can check https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/autograd.cpp. That double_backward function is only there if you need higher order derivatives.
3- If you want to do this in C, the file you linked (convolution.cpp) shows you how to do this (function at::Tensor _convolution...). If you inspect the function you see it just checks which implementation to use (params.use_something...) and use it. If you want to do this in python you should start tracing from conv until where this file convolution.cpp is called.

I have figured something addition to #unlut's post.
The convolution method are in separate files for different implementations. You may find cudnn_convoluton_backward or mkldnn_convolution_backward easily. One tricky thing is that the final native fall function is hard to find. It is because currently Pytorch Teams are porting Thnn function to ATen, you could refer to PR24507.
The native function could be find as thnn_con2d_backward.
The convolution backward is not calculated via autograd, rather, there must a conv_backward function and this must be recorded in derivatives.yaml. If you want to find specific backward function, refer to that file is a good start.
About this code, if you want to directly call thnn_backward function, you need to explicitly construct finput and fgrad_input. These are two empty tensor offering as a buffer.
at::Tensor finput = at::empty({0},input.options());
at::Tensor fgrad_input = at::empty({0}, input.options());
auto kernel_size = weight.sizes().slice(2);
auto &&result = at::thnn_conv2d_backward(grad_output, input, weight,kernel_size , stride, padding,
finput, fgrad_input, output_mask);

Related

Specify fixed parameters and parameters to be search in optuna (lightgbm)

I just found Optuna and it seems they are integrated with lightGBM, but I struggle to see where I can fix parameters, e.g scoring="auc" and where I can define a gridspace to search, e.g num_leaves=[1,2,5,10].
Using https://github.com/optuna/optuna/blob/master/examples/lightgbm_tuner_simple.py as example, they just define a params dict with some fixed parameters (are all parameters not specified in that dict tuned?), and the documentation states that
It tunes important hyperparameters (e.g., min_child_samples and feature_fraction) in a stepwise manner
How can I controll which parameters are tuned and in what space, and how can I fix some parameters?
I have no knowledge of LightGBM, but since this is the first result for fixing parameters in optuna, I'll answer that part of the question:
In optuna, the search space is defined within the code of the objective function. This function should take a 'trials' object as an input, and you can create parameters by calling the suggest_float(), suggest_int() etc. functions on that trials object. For more information, see the documentation at 10_key_features/002_configurations.html
Generally, fixing a parameter is done by hardcoding it instead of calling a suggest function, but it is possible to fix specific parameters externally using the PartialFixedSampler

In Python, is what are the differences between a method outside a class definition, or a method in it using staticmethod?

I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.

How to do this in Tensorflow 2.0?

I am trying for the first time the Tensorflow 2.0.
Is this idiomatic?
#tf.function
def add(a,b):
return a+b
if __name__=="__main__":
result=add(1.0,2.0)
print(result)
print(tf.keras.backend.get_value(result))
However, I get this warning related to the add function:
WARNING:tensorflow:Entity <function add at 0x7ff34781a2f0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause:
What does it mean?
How can I correct this?
I didn't get the warning in the latest tf-nightly (2.1.0-dev20200104). But there are two pieces of advice regarding your code,
As #thushv89 pointed out, it is generally a good idea to pass Tensor objects
when calling functions decorated by #tf.function. In some cases, passing
plain Python data types (like floats and ints) may cause the function
to be "recompiled", dramatically slowing down the performance. It doesn't happen
in this case. But it's good to be careful in general.
In TF2's eager execution, tf.keras.backend.get_value(result) is a no-op.
You can omit that call. result is the Tensor value per se and it holds the
concrete values.

How to create a custom Kernel for a Gaussian Process Regressor in scikit-learn?

I am looking into using a GPR for a rather peculiar context, where I need to write my own Kernel. However I found out there's no documentation about how to do this. Trying to simply inherit from Kernel and implementing the methods __call__, get_params, diag and is_stationary is enough to get the fitting process to work, but then breaks down when I try to predict y values and standard deviations. What are the necessary steps to build a minimal but functional class that inherits from Kernel while using its own function? Thanks!
Depending on how exotic your kernel will be, the answer to your question may be different.
I find the implementation of the RBF kernel quite self-documenting, so I use it as reference. Here is the gist:
class RBF(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
def __init__(self, length_scale=1.0, length_scale_bounds=(1e-5, 1e5)):
self.length_scale = length_scale
self.length_scale_bounds = length_scale_bounds
#property
def hyperparameter_length_scale(self):
if self.anisotropic:
return Hyperparameter("length_scale", "numeric",
self.length_scale_bounds,
len(self.length_scale))
return Hyperparameter(
"length_scale", "numeric", self.length_scale_bounds)
def __call__(self, X, Y=None, eval_gradient=False):
# ...
As you mentioned, your kernel should inherit from Kernel, which requires you to implement __call__, diag and is_stationary. Note, that sklearn.gaussian_process.kernels provides StationaryKernelMixin and NormalizedKernelMixin, which implement diag and is_stationary for you (cf. RBF class definition in the code).
You should not overwrite get_params! This is done for you by the Kernel class and it expects scikit-learn kernels to follow a convention, which your kernel should too: specify your parameters in the signature of your constructor as keyword arguments (see length_scale in the previous example of RBF kernel). This ensures that your kernel can be copied, which is done by GaussianProcessRegressor.fit(...) (this could be the reason that you could not predict the standard deviation).
At this point, you may notice the other parameter length_scale_bounds. That is only a constraint on the actual hyper parameter length_scale (cf. constrained optimization). This brings us to the fact, that you need to also declare your hyper parameters, that you want optimized and need to compute gradients for in your __call__ implementation. You do that by defining a property of your class that is prefixed by hyperparameter_ (cf. hyperparameter_length_scale in the code). Each hyper parameter that is not fixed (fixed = hyperparameter.fixed == True) is returned by Kernel.theta, which is used by GP on fit() and to compute the marginal log likelihood. So this is essential if you want to fit parameters to your data.
One last detail about Kernel.theta, the implementation states:
Returns the (flattened, log-transformed) non-fixed hyperparameters.
So you should be careful with 0 values in your hyper parameters as they can end up as np.nan and break stuff.
I hope this helps, even though this question is already a bit old. I have actually never implemented a kernel myself, but was eager to skim the sklearn code base. It is unfortunate that there are no official tutorials on that, the code base, however, is quite clean and commented.

writing own data reader for tensorflow producing tensors directly

I like to write a new data reader for tensorflow which produces multiple feature/label tensors directly w/o decoding the data from a string. I looked at the new_data_formats tutorial yet I do not want my own reader class to interact through the interface
Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
since I am producing tensors directly. The reader should take a filename from a filename queue and produce multiple tensors (depending on the filesize) which are then enqueued into a random batch queue. My question is: From which class should my reader inherit to produce the tensors? I think it is not sufficient to implement this simply as a new op due to thread safety. I noticed that the resource_op_kernel class maybe a suitable starting point.
Since this is quite deep inside tensorflow any pointers or additional hints where to start and what pitfalls may lie ahead are helpful (specifically some explanations about resource management, custom ops & thread-safety inside tensorflow).
It sounds like the implementation of tf.data.FixedLengthRecordDataset would be a good place to start (C++ op implementation here). That already takes a filename and returns Tensors directly, so it sounds like you'd just want to output several Tensors rather than one.
Granted that this is using tf.data rather than queues. Probably a good idea regardless.

Resources