I'm looking to implement a custom autograd. A function where the backward pass is a mix of a custom function and the derivative of a function which torch should be able to find by itself.
For a simple example, say I wanted to create a function for y = x * exp(x)
def custom_function(torch.autograd.Function):
#staticmethod
def forward(ctx, x):
ctx.save_for_backward(x)
return x * exp(x)
#staticmethod
def backward(ctx, grad_output):
x = ctx.saved_tensors[0]
custom_derivative = x * [d/dx torch.exp(x)] + torch.exp(x)
return grad_output * custom_derivative
How would one call the derivative of a known pytorch function in the backward pass?
Related
I'm a noob to coding and just began my question. I started with python OOP and I ran into some trouble.
class Multidiv:
def __init__(self, mulitple):
self.mulitple = mulitple
def mulitple(self, x, y):
return x * y
def divide(self, x, y):
pass
math = Multidiv(mulitple, 10, 5)
print(math)
I keep getting a nameError and I don't understand why. Please help.
there's a lot of mess in your code.. I suggest you get back to reading documentation/watching videos.
for starter - mulitple is not defined.
secondly, you're sending 10 and 5 but ignoring them in the init function. they won't get into to mulitple function.
you can do what you're trying to achieve like this:
class Multidiv:
def __init__(self, mulitple):
self.action = mulitple #save the string name as a member of the object.
def mulitple(self, x, y):
return x * y
def divide(self, x, y):
pass
math = Multidiv('mulitple') #pass the action name with qoutes as string, otherwise it won't be recognized.
actual_action = getattr(math, math.action) # use builtin function getattr to get the actual wanted method out of the object. (you can read about getattr online)
print(actual_action(10, 5)) call the actual action with the parameters you wish to calculate
I saw the following code segment for extending nn.Mudule. What I do not understand is the input_ # self.weight in forward function. I can understand that it is try to use the weight information of input_. But # is always used as decorator, why it can be used this way?
class Linear(nn.Module):
def __init__(self, in_size, out_size):
super().__init__()
self.weight = nn.Parameter(torch.randn(in_size, out_size))
self.bias = nn.Parameter(torch.randn(out_size))
def forward(self, input_):
return self.bias + input_ # self.weight
linear = Linear(5, 2)
assert isinstance(linear, nn.Module)
assert not isinstance(linear, PyroModule)
example_input = torch.randn(100, 5)
example_output = linear(example_input)
assert example_output.shape == (100, 2)
The # is a shorthand for the __matmul__ function: the matrix multiplication operator.
I am providing a minimal example of what I want to solve. I have defined a class and in that there are some variables defined across different functions. I want to know how to track those variables across functions to get the gradient. I think I have to use tf.GradientTape but I have tried some variants without success.
class A():
def __init__(self):
self.alpha = tf.Variable(2.0)
def f1(self):
wt = self.alpha * 5.0
return wt
def f2(self):
wt_f1 = f1()
with tf.GradientTape() as tape:
wt_f2 = wt_f1 * 10.0
print(tape.gradient(wt_f2, self.alpha))
a = A()
print(a.f2())
The last line returns None. Clearly the derivative of wt_f2 with respect to alpha is 50.0. However, I get None. Any idea? I tried initializing a persistent gradient tape in the __init__ function and use that to watch variables such as wt and self.alpha but that didn't help. Any idea?
Update 1:
Putting wt_f1 call under tape does not work .
class A():
def __init__(self):
self.alpha = tf.Variable(2.0)
def f1(self):
wt = self.alpha * 5.0
return wt
def f2(self):
with tf.GradientTape() as tape:
wt_f1 = f1()
wt_f2 = wt_f1 * 10.0
print(tape.gradient(wt_f2, self.alpha))
This also returns None.
You are printing None. Because f2() returns nothing, so you get None.
Remove print:
a = A()
a.f2()
Furthermore some edits may be good for your written code.
You missed the self before f1() function and this works because you have defined f1 function somewhere else. Anyway add self.f1().
Move print statement outside of tape scope. Because it's better to get gradient where recording is finished.
Add tape.watch() to ensure it is being traced by tape.
class A():
def __init__(self):
self.alpha = tf.Variable(2.0)
def f1(self):
wt = self.alpha * 5.0
return wt
def f2(self):
with tf.GradientTape() as tape:
tape.watch(self.alpha)
wt_f1 = self.f1()
wt_f2 = wt_f1 * 10.0
print(tape.gradient(wt_f2, self.alpha))
Consider the two classes below.
class Alpha:
def __init__(self):
pass
def Bar(self, x):
def Foo(mult):
return x * mult
self._Foo = Foo
def Foo(self, mult):
return self._Foo(mult)
class Beta:
def __init__(self):
pass
def Bar(self, x):
self._x = x
def Foo(self, mult):
return self._x * mult
For Alpha with a deferred function _Foo, I believe it is more efficient memory-wise since it only evaluates x when the function is called. For Beta on the other hand, x is stored explicitly as a class attribute.
The question is, where exactly is x stored in Alpha? How efficient is it compared to Beta?
x is not stored in alpha as a class attribute, it is only created when you call the function, Foo. This eliminates the need unnecessary data storage.
I have three features:
feature_one -> number of tokens in the given sentence.
feature_two -> number of verbs in the given sentence.
feature_three -> number of tokens - number of verbs in the given sentence.
(feature_one - feature_two)
I have written custom transformers for feature_one and feature_two and want to written custom transformer for feature_three such that I can use result of feature_one and feature_two by running pipeline as:
Pipeline([
#input to feature_one and feature_two is list of sentences.
("feature", FeatureUnion([
("feature_one", feature_one_transformer()),
("feature_two", feature_two_transformer())
])),
("feature_three", feature_three_transformer())
])
feature_one_transformer:
class feature_one_transformer(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, x, y):
return self
def transform(self, sentence_list):
number_of_tokens_in_sentence_list = list()
for sentence in sentence_list:
number_of_tokens = compute_number_of_tokens
number_of_tokens_in_sentence_lista.append(number_of_tokens)
return pandas.DataFrame(number_of_tokens_in_sentence_list)
feature_two_transformer:
class feature_two_transformer(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, x, y):
return self
def transform(self, sentence_list):
number_of_verbs_in_sentence_list = list()
for sentence in sentence_list:
number_of_verbs = compute_number_of_verbs_in_sentence
number_of_verbs_in_sentence_lista.append(number_of_verbs)
return pandas.DataFrame(number_of_verbs_in_sentence_list)
Can somebody tell me how should I write custom transformer for feature_three and how to use in pipeline so that I can use result of feature_one and feature_two transformers. Thank you.
It's not clear to me why you would want to make this so complicated. I would just use one transformer that does everything you want. Something like this:
class features_transformer(BaseEstimator, TransformerMixin):
def __init__(self, variable):
self.variable = variable
def fit(self, X):
return self
def transform(self, X):
X['number_of_tokens'] = X[self.variable].apply(lambda cell: compute_number_of_tokens(cell))
X['number_of_verbs'] = X[self.variable].apply(lambda cell: compute_number_of_verbs(cell))
X['tokens_minus_verbs'] = X['number_of_tokens'] - X['number_of_verbs']
return X
new_X = features_transformer('sentences').fit_transform(X)