Keras - using activation function with a parameter - keras

How is it possible to use leaky ReLUs in the newest version of keras?
Function relu() accepts an optional parameter 'alpha', that is responsible for the negative slope, but I cannot figure out how to pass ths paramtere when constructing a layer.
This line is how I tried to do it,
model.add(Activation(relu(alpha=0.1))
but then I get the error
TypeError: relu() missing 1 required positional argument: 'x'
How can I use a leaky ReLU, or any other activation function with some parameter?

relu is a function and not a class and it takes the input to the activation function as the parameter x. The activation layer takes a function as the argument, so you could initialize it with a lambda function through input x for example:
model.add(Activation(lambda x: relu(x, alpha=0.1)))

Well, from this source (keras doc), and this github question , you use a linear activation then you put the leaky relu as another layer right after.
from keras.layers.advanced_activations import LeakyReLU
model.add(Dense(512, 512, activation='linear')) # Add any layer, with the default of an identity/linear squashing function (no squashing)
model.add(LeakyReLU(alpha=.001)) # add an advanced activation
does that help?

You can build a wrapper for parameterized activations functions. I've found this useful and more intuitive.
class activation_wrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
def _func(x):
return self.func(x, *args, **kwargs)
return _func
Of course I could have used a lambda expression in call.
Then
wrapped_relu = activation_wrapper(relu).
Then use it as you have above
model.add(Activation(wrapped_relu(alpha=0.1))
You can also use it as part of a layer
model.add(Dense(64, activation=wrapped_relu(alpha=0.1))
While this solution is a little more complicated than the one offered by #Thomas Jungblut, the wrapper class can be reused for any parameterized activation function. In fact, I used it whenever I have a family of activation functions that are parameterized.

Keras defines separate activation layers for the most common use cases, including LeakyReLU, ThresholdReLU, ReLU (which is a generic version that supports all ReLU parameters), among others. See the full documentation here: https://keras.io/api/layers/activation_layers
Example usage with the Sequential model:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(10))
model.add(tf.keras.layers.Dense(16))
model.add(tf.keras.layers.LeakyReLU(0.2))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation(tf.keras.activations.sigmoid))
model.compile('adam', 'binary_crossentropy')
If the activation parameter you want to use is unavailable as a predefined class, you could use a plain lambda expression as suggested by #Thomas Jungblut:
from tensorflow.keras.layers import Activation
model.add(Activation(lambda x: tf.keras.activations.relu(x, alpha=0.2)))
However, as noted by #leenremm in the comments, this fails when trying to save or load the model. As suggested you could use the Lambda layer as follows:
from tensorflow.keras.layers import Activation, Lambda
model.add(Activation(Lambda(lambda x: tf.keras.activations.relu(x, alpha=0.2))))
However, the Lambda documentation includes the following warning:
WARNING: tf.keras.layers.Lambda layers have (de)serialization limitations!
The main reason to subclass tf.keras.layers.Layer instead of using a Lambda layer is saving and inspecting a Model. Lambda layers are saved by serializing the Python bytecode, which is fundamentally non-portable. They should only be loaded in the same environment where they were saved. Subclassed layers can be saved in a more portable way by overriding their get_config method. Models that rely on subclassed Layers are also often easier to visualize and reason about.
As such, the best method for activations not already provided by a layer is to subclass tf.keras.layers.Layer instead. This should not be confused with subclassing object and overriding __call__ as done in #Anonymous Geometer's answer, which is the same as using a lambda without the Lambda layer.
Since my use case is covered by the provided layer classes, I'll leave it up to the reader to implement this method. I am making this answer a community wiki in the event anyone would like to provide an example below.

Related

how to access value of a learned parameter of an activation function in a sequential network

I am implementing my custom activation function with learnable parameters. For example this can be similar to PReLu https://pytorch.org/docs/stable/generated/torch.nn.PReLU.html which a learnable parameter a.
How can I access/view the a parameter value after training ?
Solution the issue was related to the fact that it was a sequential network and this helped solve the issue https://discuss.pytorch.org/t/access-weights-of-a-specific-module-in-nn-sequential/3627
The nn.PReLU layer is a nn.Module, like most other layers you can access the weights directly using the weight property.
>>> act = nn.PReLU()
>>> act.weight
Parameter containing:
tensor([0.2500], requires_grad=True)

Pytorch how use a linear activation function

In Keras, I can create any network layer with a linear activation function as follows (for example, a fully-connected layer is taken):
model.add(keras.layers.Dense(outs, input_shape=(160,), activation='linear'))
But I can't find the linear activation function in the PyTorch documentation. ReLU is not suitable, because there are negative values in my sample. How do I create a layer with a linear activation function in PyTorch?
If you take a look at the Keras documentation, you will see tf.keras.layers.Dense's activation='linear' corresponds to the a(x) = x function. Which means no non-linearity.
So in PyTorch, you just define the linear function without adding any activation layer:
torch.nn.Linear(160, outs)
activation='linear' is equivavlent to no activation at all.
As can be seen here, it is also called "passthrough", meaning the it does nothing.
So in pytorch you can simply not apply any activation at all, to be in parity.
However, as already told by #Minsky, hidden layer without real activation, i.e. some non-linear activation is useless. It is like changing the weights which is anyway done during the network taining.
As already answered you don't need a linear activation layer in pytorch. But if you need to include it, you can write a custom one, that passes the output as follows.
class linear(torch.nn.Module):
# a linear activation function based on y=x
def forward(self, output):return output
The you can call it like any other activation function.
linear()(torch.tensor([1,2,3])) == nn.ReLU()(torch.tensor([1,2,3]))

keras layer that computes logarithms?

I'd like to set up a Keras layer in which each node simply computes the logarithm of the corresponding node in the preceding layer. I see from the Keras documentation that there is a "log" function in its backend module. But somehow I'm not understanding how to use this.
Thanks in advance for any hints you can offer!
You can use any backend function inside a Lambda layer:
from keras.layers import Lambda
import keras.backend as K
Define just any function taking the input tensor:
def logFunc(x):
return K.log(x)
And create a lambda layer with it:
#add to the model the way you're used to:
model.add(Lambda(logFunc,output_shape=(necessaryWithTheano)))
And if the function is already defined, taking only one argument and returning a tensor, you don't need to create your own function, just Lambda(K.log), for instance.

Custom loss function in PyTorch

I have three simple questions.
What will happen if my custom loss function is not differentiable? Will pytorch through error or do something else?
If I declare a loss variable in my custom function which will represent the final loss of the model, should I put requires_grad = True for that variable? or it doesn't matter? If it doesn't matter, then why?
I have seen people sometimes write a separate layer and compute the loss in the forward function. Which approach is preferable, writing a function or a layer? Why?
I need a clear and nice explanation to these questions to resolve my confusions. Please help.
Let me have a go.
This depends on what you mean by "non-differentiable". The first definition that makes sense here is that PyTorch doesn't know how to compute gradients. If you try to compute gradients nevertheless, this will raise an error. The two possible scenarios are:
a) You're using a custom PyTorch operation for which gradients have not been implemented, e.g. torch.svd(). In that case you will get a TypeError:
import torch
from torch.autograd import Function
from torch.autograd import Variable
A = Variable(torch.randn(10,10), requires_grad=True)
u, s, v = torch.svd(A) # raises TypeError
b) You have implemented your own operation, but did not define backward(). In this case, you will get a NotImplementedError:
class my_function(Function): # forgot to define backward()
def forward(self, x):
return 2 * x
A = Variable(torch.randn(10,10))
B = my_function()(A)
C = torch.sum(B)
C.backward() # will raise NotImplementedError
The second definition that makes sense is "mathematically non-differentiable". Clearly, an operation which is mathematically not differentiable should either not have a backward() method implemented or a sensible sub-gradient. Consider for example torch.abs() whose backward() method returns the subgradient 0 at 0:
A = Variable(torch.Tensor([-1,0,1]),requires_grad=True)
B = torch.abs(A)
B.backward(torch.Tensor([1,1,1]))
A.grad.data
For these cases, you should refer to the PyTorch documentation directly and dig out the backward() method of the respective operation directly.
It doesn't matter. The use of requires_gradis to avoid unnecessary computations of gradients for subgraphs. If there’s a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Variables didn’t require gradients.
Since, there are most likely some Variables (for example parameters of a subclass of nn.Module()), your loss Variable will also require gradients automatically. However, you should notice that exactly for how requires_grad works (see above again), you can only change requires_grad for leaf variables of your graph anyway.
All the custom PyTorch loss functions, are subclasses of _Loss which is a subclass of nn.Module. See here. If you'd like to stick to this convention, you should subclass _Loss when defining your custom loss function. Apart from consistency, one advantage is that your subclass will raise an AssertionError, if you haven't marked your target variables as volatile or requires_grad = False. Another advantage is that you can nest your loss function in nn.Sequential(), because its a nn.Module I would recommend this approach for these reasons.

Missing `set_input` in keras

I notice that in https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html it says that we can set tensorflow op as input of keras model like: first_layer.set_input(my_input_tensor). But I find that keras does not have set_input function:
first_layer = Dense(32, activation='relu', input_dim=784)
first_layer.set_input(my_input_tensor)
But I get:
AttributeError: 'Dense' object has no attribute 'set_input'.
What may be the problem?
I guess set_input() method is removed in latest versions of Keras. If you see this documentation of Keras, there is a function called set_input() function of keras.layers.containers.Sequential class. But its source code is no longer available on Github.
If you look at the source code of Dense layer class in Keras, you will see that there is no such method called set_input() as well. If you also see the source of abstract class Layer which is the base class for Dense layer, you will see there is no such function called set_input().
So, we can conclude, set_input() method is probably no longer available in Keras.

Resources