I just found Optuna and it seems they are integrated with lightGBM, but I struggle to see where I can fix parameters, e.g scoring="auc" and where I can define a gridspace to search, e.g num_leaves=[1,2,5,10].
Using https://github.com/optuna/optuna/blob/master/examples/lightgbm_tuner_simple.py as example, they just define a params dict with some fixed parameters (are all parameters not specified in that dict tuned?), and the documentation states that
It tunes important hyperparameters (e.g., min_child_samples and feature_fraction) in a stepwise manner
How can I controll which parameters are tuned and in what space, and how can I fix some parameters?
I have no knowledge of LightGBM, but since this is the first result for fixing parameters in optuna, I'll answer that part of the question:
In optuna, the search space is defined within the code of the objective function. This function should take a 'trials' object as an input, and you can create parameters by calling the suggest_float(), suggest_int() etc. functions on that trials object. For more information, see the documentation at 10_key_features/002_configurations.html
Generally, fixing a parameter is done by hardcoding it instead of calling a suggest function, but it is possible to fix specific parameters externally using the PartialFixedSampler
Related
I read about the Pytorch's source code, and I find it's weird that it doesn't implement the convolution_backward function, The only convolution_backward_overrideable function is directly raises an error and supposed not to fall here.
So I referred to CuDNN / MKLDNN implementation, they both implements functions like cudnn_convolution_backward.
I got the following question:
What are the native implementation of CUDA/ CPU? I can find something like thnn_conv2d_backward_out, but I could not find where are this is called.
Why PyTorch didn't put the convolution_backward function in Convolution.cpp? It offers an _convolution_double_backward() function. But this is the double backward, it's the gradient of gradient. Why don't they offer a single backward function?
If I want to call the native convolution/ convolution_backward function for my pure cpu/cuda tensor, how should I write code? Or where could I refer to? I couldn't find example for this.
Thanks !
1- Implementation may differ depending on which backend you use, it may use CUDA convolution implementation from some library, CPU convolution implementation from some other library, or custom implementation, see here: pytorch - Where is “conv1d” implemented?.
2- I am not sure about the current version, but single backward was calculated via autograd, that is why there was not an explicit different function for it. I don't know the underlying details of autograd but you can check https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/autograd.cpp. That double_backward function is only there if you need higher order derivatives.
3- If you want to do this in C, the file you linked (convolution.cpp) shows you how to do this (function at::Tensor _convolution...). If you inspect the function you see it just checks which implementation to use (params.use_something...) and use it. If you want to do this in python you should start tracing from conv until where this file convolution.cpp is called.
I have figured something addition to #unlut's post.
The convolution method are in separate files for different implementations. You may find cudnn_convoluton_backward or mkldnn_convolution_backward easily. One tricky thing is that the final native fall function is hard to find. It is because currently Pytorch Teams are porting Thnn function to ATen, you could refer to PR24507.
The native function could be find as thnn_con2d_backward.
The convolution backward is not calculated via autograd, rather, there must a conv_backward function and this must be recorded in derivatives.yaml. If you want to find specific backward function, refer to that file is a good start.
About this code, if you want to directly call thnn_backward function, you need to explicitly construct finput and fgrad_input. These are two empty tensor offering as a buffer.
at::Tensor finput = at::empty({0},input.options());
at::Tensor fgrad_input = at::empty({0}, input.options());
auto kernel_size = weight.sizes().slice(2);
auto &&result = at::thnn_conv2d_backward(grad_output, input, weight,kernel_size , stride, padding,
finput, fgrad_input, output_mask);
I am using python's MIP module for optimization. I have set up a model with few parameters. I would want to limit the number of solutions and add stop timer if I don't find any solution in given time. I have added these parameters as given below:
m = Model(name='opt', sense=MAXIMIZE, solver_name=CBC)
m.optimize(max_solutions=1, max_seconds= 300)
somehow none of them seem working to me. it does not even stop looking for a solution after given time and it returns 2 solution sometimes even if I want to limit it to 1. Is there something I am missing in syntax?
One more thing, Gurobi has an option to add multiple variable using add_Vars parameter. Is there anything similar available in MIP too?
Thanks.
Yeah I have been doing some tests myself (with the Python-MIP solver) and seen some similar issues. Apparently it's still quite new and many improvements have been implemented recently or are yet to be developed. I will post from what I've learned:
regarding max_seconds: There's been at least one (closed) issue on the official repo related to using max_seconds parameter and CBC.
regarding max_solutions: If you are using version 1.6.2 or before, here's an explanation for that: until 1.6.1, the m.optimize(max_solutions=1) wasn't setting the maximum solution parameter to CBC. In that case you should try with the following lines (or just update to current version):
m.max_solutions = 1
m.optimize()
If the former don't help for the max_seconds and max_solutions parameters, I guess you'd better post your question as an issue at the library's repo to get answers and support from the project contributors.
Adding multiple variables, similar to gurobipy's Model.addVars() method: Yes, you can do it as follows
p = {(i, j): m.add_var(var_type=CONTINUOUS, lb=0, name="p[%d,%d]" % (i, j))
for i in Set_i for j in Set_j}
In this example, we are adding a variable p_ij and specifying it's continuous
(vs. binary or integer), has lower bound 0, and the sets where the indexes run. Set_i and Set_j are Python lists. See the documentation here for a more detailed explanation on how to use it. Similarly, you can create indexed constraints using the add_constr method, similar to Gurobi's Model.AddConstrs() method.
I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.
a question regarding keywords in torch.nn.Sequential, it is possible in some way to forward keywords to specific models in a sequence?
model = torch.nn.Sequential(model_0, MaxPoolingChannel(1))
res = model(input_ids_2, keyword_test=mask)
here, keyword_test should be forwarded only to the first model.
Thank a lot and best regards!
my duplicate from - https://discuss.pytorch.org/t/keyword-arguments-in-torch-nn-sequential/53282
No; you cannot. This is only possible if all models passed to the nn.Sequential expects the argument you are trying to pass in their forward method (at least at the time of writing this).
Two workarounds could be (I'm not aware of the whole case, but anticipated from the question):
If your value is static, why not initializing your first model with that value, and access it during computation with self.keyword_test.
In case the value is dynamic, you could have it as an inherent property in the input; hence, you can access it, also, during computation with input_ids_2.keyword_test
I suppose there could be historical reasons for this naming and that other languages have similar feature, but it also seems to me that parameters always had a name in C#. Arguments are the unnamed ones. Or is there a particular reason why this terminology was chosen?
Oh, you wanted arguments! Sorry, this is parameters - arguments are two doors down the hall on the left.
Yes, you're absolutely right (to my mind, anyway). Ironically, although I'm usually picky about these terms, I still use "parameter passing" when I should probably talk about "argument passing". I suppose one could argue that prior to C# 4.0, if you're calling a method you don't care about the parameter names, whereas the names become part of the significant metadata when you can specify them on the arguments as well.
I agree that it makes a difference, and that terminology is important.
"Optional parameters" is definitely okay though - that's adding metadata to the parameter when you couldn't do so before :) (Having said that, it's not going to be optional in terms of the generated IL...)
Would you like me to ask the team for their feedback?
I don't think so. The names are quite definitely the names of parameters, as they are defined and given a specific meaning in the method definition, where they are properly called the parameters to the method. At the call site, arguments can now be tagged with the name of the parameter that they supply a value for.
The new term refers to the perspective of the method caller - which is logical because that's where the feature applies. Previously, callers only had to think of parameters as being "positioned parameters". Now they can optionally treat them as "named parameters" - hence the name.
I dont know if its worth adding it now, but MS calls it named arguments anyway. See named and optional arguments